t 
F 


A_FORMAL SYSTEM 
FOR DEFINING THE SYNTAX AND SEMANTICS 
OF COMPUTER LANGUAGES 


by 


HENRY FRANCIS LEDGARD 


- B.S., Tufts University 
(1964) 
S.M., Massachusetts Institute of Technology 
(1965) 
E.E., Massachusetts Institute of Technology 
(1967) 


SUBMITTED IN PARTIAL FULFILLMENT 
OF THE REQUIREMENTS FOR THE 
DEGREE OF DOCTOR OF 
PHILOSOPHY 
at the 
MASSACHUSETTS INSTITUTE OF 
TECHNOLOGY 
February, 1969 


Signature of Author 


eal Engineering 
February 24, 1969 


Ce\wuuenL "Daas. 
yr 


hesis Supervisor 


Department of e 
Certified by 


Accepted by 


Chairman, Departmental Committee 


on Graduate Students 


A FORMAL SYSTEM 
FOR DEFINING THE SYNTAX AND SEMANTICS 
OF COMPUTER LANGUAGES 


by 


Henry Francis Ledgard 


Submitted to the Department of Electrical Engineering on 
February 24, 1969 in partial fulfillment of the requirements 
for the degree of Doctor of Philosophy. 


ABSTRACT 


The thesis of this dissertation is that formal definitions 
of the syntax and semantics of computer languages are needed. 
This dissertation investigates two candidates for formally 
defining computer languages: 

(1) the formalism of canonical systems for defining 
the syntax of a computer TEREUARS and its translation into 
a target language, and 

(2) the formalisms of the A-calculus and extended 
Markov algorithms as a combined formalism used as the basis 
of a target language for defining the semantics of a computer 
language. 


Formal definitions of the syntax and semantics of SNOBOL/1 
and ALGOL/60 are included as examples of the approach, 


Thesis Supervisor: Edward L. Glaser 

Title: Associate Professor of Electrical Engineering, M.1I.T. 
(currently Chairman, Department of Information and 
Computer Sciences, Case Western Reserve University) 


- sew 


ACKNOWLEDGEMENT 


To. Professor Edward Glaser, whose insight and imagination 
have sparked my enthusiasm and prompted many major develop- 
ments throughout this dissertation; 

To’ Professor John Wozencraft, whose warm guidance and 
penetrating criticisms have motivated a standard that this 
dissertation can only approximate; 

To Professor Robert Graham, whose practical understand- 
ing of computer languages has helped initiate and direct 
this dissertation; 

To Peter Landin, who patiently devoted hours teaching me 
his ideas on computer languages; | 

To Professor John Donovan, for his collaboration on 
canonic systems; 

To Calvin Mooers, for many lively aiwcusstous on key 
issues; 

To Leon Groisser, for his wise and thoughtful comments on 
my life as a student; | | 

And to my parents, whose lifelong support has been in- 


valuable. 


"Work reported herein was supported (in part) by 
Project MAC, an M.I.T. research program sponsored 
by the Advanced Research Projects Agency, Depart- 
ment of Defense, under Office of Waval Research 
Contract Number Nonr-4102(01),. : -Reproduction in 
whole or in part is permitted - for any -purpose of 
the United States Government." 


A Virtuoso Typist: Mrs. Lila 8. Hartmann 


3 


sete Ce RR ee ee Re ag oe TI Te ae, Ee Te ee OR EM TR Oe | ge te he de Dh me eee Mage 


STATEMENT OF ORIGIN 


I gratefully acknowledge the following men, upon whose 
work this dissertation is heavily based. In particular: 


b. 


The formalism of canonical systems is due to Emil 
Post and Raymond Smullyan. 


The application of “canonic" systems to specify the 
syntax of a computer language was first made by 
John Donovan. 


The notion of a defining canonical system and its 
use in formalizing derivations appeared earlier in 
works by Smullyan and Donovan. 


The formalism of the A=-calculus is due to Alonzo 
Church. 


The application of the A-calculus to define a por- 
tion of the semantics of a computer language was 
first made by Peter Landin. 


The characterizations of the semantics of ALGOL/60 
and of the evaluator for the target language are 
based in part on similar characterizations by Landin. 


The formalism of Markov algorithms is due to A. A. 
Markov. 


The notion of adding string variables to Markov 
algorithms is due to A. Caracciolo. 


The application and integration of the above work to 
define the syntax and semantics of computer languages is the 
principal contribution of this dissertation. In particular: 


ae 


The application of canonical systems to define the 
translation of computer languages is due to the 
author. 


The application of defining canonical systems to de- 
fine notational abbreviations is new. 


The notation for canonical systems and the uniform 
notation for defining canonical-systems are for the 
most part new. 


The application of the A-calculus and (extended) 
Markov algorithms to define the primitive functions 
in a computer language is new. 


The application of (extended) Markov algorithms to 
define the operation of an evaluator for the target 
language for characterizing semantics is new. 


The definitions of the syntax and semantics of 
SNOBOL/1 and ALGOL/60 are new. 


II. 


IIl. 


Iv. 


VI. 


TABLE OF CONTENTS 


INTRODUCTION’ ce 8. i..-S. GAO eh eG AS Ae ee we 


CANONEAL SYSTEMS: A SELF-EXTENDING FORMALISM 
FOR SPECIFYING THE SYNTAX OF A COMPUTER LANGUAGE 
AND ITS TRANSLATION INTO A TARGET LANGUAGE... 12 


1. Canonical Systems. . 2. 6 2 2 8 «@ © ew ew 15 
a. The Basic Formaliam. . . 1 6 « «© « « 
b. Application to Specify Syntax. .. . 
ec. Application to Specify Translation . 


2. Defining CanonkalSystems . ..... +s. 5 ss 32 
a. The ‘Notion of a 
Defining CanonimlSystem. .. ‘ a ~ ae 
b. Application to Derive Byntactically 
Legal Programs and Their Translations. . 34 
c. Application to Specify 
Notational Abbreviations ......6.++. 39 


3. Diseusston ic co secb ee RO aes SE Ce ee wee Oe 4 


EXTENDED MARKOV ALGORITHMS AND )-CALCULUS: 
A COMBINED FORMALISM USED -AS THE BASIS OF 
A LANGUAGE FOR DEFINING SEMANTICS. ....... 49 


1. The Target Language. ..... a Ae Sa ee ee, SE 
&. Extended Markov Algorithms a) 28s “eas “oy Sat Be ee 
‘bh. The ,-Calculus ... eo HO 

c. The Marriage of Extended Markov. 
Algorithms to the A-Galculus ...... 63 
d. The Target Language... . . . 2. eee we TH 


2. An Evaluator for the Target Language... ... . 84 


3. Discussion . 2. . 2 ere 0 ee © 8 ee te te ce @ 6 f 100 


A SPECIFICATION OF THE SYNTAX 
AND SEMANTICS OF SNOBOL/1. . . . «© 2. © © «© «© « » 103 


A SPECIFICATION OF THE SYNTAX 
AND SEMANTICS OF ALGOL/60. . . . . «© «© se « » » 122 


DIBCUSAION «ce. eo) oe wl SS BOR Ow we we a TED 


REFERENCES | gis 65 3). ae eS Goo ve Pee a ee a es ae ee ROT 


BIOGRAPHICAL NOTE. . . 2. 6 e 2 © 2 e 2 ee ew ew ew ww 203 


APPENDICES 


1. CANONICAL SYSTEMS 

1.1 Canonical System Specifying the 

Syntax of a Subset of ALGOL/60. .... 

1.2 Canonical System Specifying the Translation 

of the Subset into Assembler Language ... 
3 A Defining Canonical System for the Subset. 
-4 Derivation of a Syntactically Legal Program 
and its Translation into Assembler Language 


2. THE TARGET LANGUAGE 
2.1 Canonical System Specifying the Translation 
of the ALGOL/60 Subset into the Target 


Language. ... A er er 
2.2 Definition of the Primitive 
Functions in the Subset . . . 2. 2. 2. «© se 


2.3 Definition of an Evaluator 
for the Target Language . ..... s+. «© « 


3.  SNOBOL/1 
3.1 Canonical System Specifying Syntax. .... 
3.2 Canonical System Specifying 
Translation into the Target Language. ... 
3.3 Definition of Primitive Functions ..... 


4, ALGOL/60 
4.1 Canonical System Specifying Syntax. .... 
4,2 Canonical System Specifying 
Translation into the Target Language. ... 
4.3 Definition of Primitive Functions ..... 


>. THEORETICAL BACKGROUND OF CANONICAL SYSTEMS. .. 


ILLUSTRATIONS 
Cartoon Based on "Machines Should Work, People 
Should Think," Slogan from IBM Television 
and Magazine Advertisements . . . . 6 « «© « « « 


Vending Machine of the Future. .... 6. se es ew we 


DEFINITIONS 


The following words are used like household words in 
this dissertation: 


Symbol: 


Alphabet: 
String: 
Language: 


Syntax: 


Semantics: 


Translation: 


Abbreviation: 


A character or any indivisible ‘BeQquence of 
characters. 


A set of symbols. 
A sequence of symbols on an alphabet. 
A set of strings. 


The set of rules specifying the SUE ERSe in a 
language. 


The set of rules relating the strings in a 


language to the "behavior" or “objects” that 


the strings denote. For a computer language 
implemented by translating the strings in the 
language into strings in a@ target language, 

the behavior or objects.that a string denotes 


_is defined by ‘the ‘corresponding target lan- 


guage string, whose meaning is presumably 
understood, 


A function mapping one set of strings into 
another set of a 


A bijective function mapping one set of 
strings (the unabbreviated strings) into 
another set of strings (the abbreviated 
strings). The bijectiveness of the function 


‘insures the unique ‘reversibility of the map- 


ping. 


+ 
Machines should work, people should think. 


* stogan from BM television ond magazine advertisements 


CHAPTER TI 


INTRODUCTION 


This dissertation has a thesis: that formal defini- 
tions of the syntax and semantics of computer languages are 
needed. The formal system presented here was developed as 
a step towards meeting this objective. 

There already exist formalisms, languages, and techniques 
for defining syntax and semantics. To be successful, a de- 
fining mechanism (or for that matter a computer language) 
should be simple, do clever things, and at the same time dis- 
play fundamental principles about the objects being defined. 
Most methods for defining computer languages do not satisfy 
these criteria. The objective of this dissertation was to 
attempt to meet these criteria, to develop a lucid and uniform 
method for defining computer languages. A formal approach to 
language definition was taken in the hope that this approach 
would gain a degree of precision, simplicity and theoretical 
power. Although these virtues are not completely satisfied 
in this dissertation, I believe the formal system presented 
here excels existing methods for defining the syntax and 
semantics of a computer language. The shortcomings of this 
approach to language definition and recommendations for 
future research in removing these shortcomings are discussed 


in the conclusions of Chapters II and III and in Chapter VI. 


Research generally progresses in two directions: in 
the development of new theories, and in the application and 
simplification of existing theories. This research is @ 
study in the second direction. In particular, an attempt 
has been made to keep the notation and terminology of the 
formal system as simple as possible. It is natural for the 
author of a work to introduce notation, terminology, and 
conventions that became convenient for him to use, but which 
often obscure the work and its contributions to others. This 
author has tried to avoid this temptation. 

The formal system for defining syntax and semantics will 
be given in two parts. First, Chapter II presents the for- 
malism of canonical systems, which will be used to define the 
syntax of a computer language and its translation into an 
arbitrary target language. Second, Chapter III presents the 
formalisms of extended Markov algorithms and the A-calculus, 
which will be used as the basis for a particular target 
language for defining the semantics of a computer language. 
The semantics of the target language are specified, in turn, 
by giving an extended Markov algorithm definition of a func- 
tion for mapping a string in the target language into a 
string denoting its value. 

Chapters IV and V illustrate the formal system by de- 
fining the syntax and semantics of the computer languages 
SNOBOL/1 and ALGOL/60. In particular, Chapter IV describes 


SNOBOL/1 in the spirit of providing a reference manual for 


10 


SNOBOL/1, and is directed to the reader who wishes a detailed 
knowledge of the language. Chapter V not only explicates 

the formal definition of ALGOL/60 but also relates the formal 
definition to other languages and other methods of language 
definition, Finally, Chapter VI contains a discussion of the 


utility of the formal system in defining computer languages. 


ll 


ete san ee ti RE ON EO OT gg OO I ee te ee ce NA GG, REE Le ee 


CHAPTER IT 


CANONICGL SYSTEMS: A SELF-EXTENDING FORMALISM 
FOR SPECIFYING THE SYNTAX OF A COMPUTER LANGUAGE 
AND ITS TRANSLATION INTO A TARGET LANGUAGE 


This chapter presents the formalism of canon systems 
and its application to define the syntax of a computer language 
and its translation into a target language. 

The mathematical underpinnings of canonical systems are due 
to Emil Post? and Raymond Smullyan.* Canonical systems can be 
used to specify any "recursively enumerable" set.* The set 
of strings comprising all syntactically legal programs in a 
computer language and the set of pairs of strings comprising 
all syntactically legal programs in a computer language and 
their translations into a target language are Just two examples 
of recursvely enumerable sets. Presumably, canonical systems 
can specify any translation or algorithm that a machine can 
perform. Heuristic evidence that this statement is true is 


30,31 


due to the works of Turing and Kleene. >> In these works 


the notion of functions computable by a Turing machine were 


30 


asserted to comprise every function or algorithm that is 


intuitively computable by machine, and the functions comput- 


31,32 to the 


able by a Turing machine were shown equivalent 
set of all "general recursive" sets, which are encompassed by 
canonicalsystems. 

The application of a logically modified variant of the 


formal systems of Post, Smullyan,* and Trenchard More 2% to 


12 


specify completely the syntax of a computer language was first 


345 Donovan applied his formal system 


made by John Donovan. 
to specify the set of legal programs in a computer language, 
including the specification of allowable character spacing, 
and more importantly, the specification of sontaxtcvensitive 
requirements on the set of legal programs, like the require- 
ment that all statement labels in a program be different. 

Donovan introduced the term "canonic systems" (in recog- 
nition of Post's work?) to describe nis fornal system. Al- 
though Donovan's formal system is not used here, many ideas 
and techniques presented here have stemmed from Donovan's 
work. The name "canonical systems" is used to distinguish 
the formal system presented in this dissertation from the 
formal systems of Post, Smullyan and Donovan, A discussion 
of the theoretical background for capenieal systems (as pre- 
sented here) is given in Appendix 5. The terminology for 
canonical systems presented here is due to both Post? and 
Smullyan.* The notation for canonical systems presented here 
is due in part to Post, Smullyan® and Donovan, > and is in. 
large part new. Many hours were spent in developing the nota- 
tion presented here in the hope that the notation would be 
well-suited to computer languages. Discussions with Calvin 
Mocers have had a major effect on the notation. 

To illustrate by example the techniques used in specify- 
ing the syntax and translation of a computer language with 
canonical systems, a small and rather useless subset of subset 


of ALGOL/ 602° will be taken as a source language, while IBM 


13 


System/360 assembler denguage?* 


will be taken as a target 
language. The Backus-Naur form specification of the ALGOL/60 


subset is given below: 


<DIGIT> z= 1/2/13 
<VAR> 2:33 «ALB 
<PRIMARY> <DIGIT> | <VAR> 


<ARITH EXP> ::= <PRIMARY> | <ARITH EXP> + <PRIMARY> 
<STM> ::= <VAR>:=<ARITH EXP> 


<TYPE LIST> 


::= A | B | A,B 
<DEC> 235 


INTEGER<TYPE LIST> 


<PROGRAM> ::= BEGIN <DEC> ; <STM> END 


This subset allows programs containing only one declaration 
and one limited type of arithmetic assignment statement. 

The rules for constructing a canonical system definition 
of a computer language, the rules for abbreviating a canonical 
system, and the rules for deriving strings defined by a 
canonical system will be presented informally in Section 2.1 
of this chapter using the English language. In Section 2.2 
these rules will be formally stated using the notion of a 
defining canonical system. “In particular, each underlined 
expression in the next section will be defined formally in 
Section 2.2 with a defining canonicdl system. I now proceed 
to the informal definition of canonkal systems and the appli- 
cation of this formalism to specify the syntax and translation 


of a computer language. 


14 


2.1 Canonical Systems 


2.la The Basic Formalism 


A canontal system consists of a collection of the follow- 
ing items: 


(1) An alphabet A, called the object alphabet. 


(2) An alphabet P, called the predicate alphabet. Each 
predicate in the predicate alphabet is assigned a 
unique positive integer called its degree. 


(3) An alphabet V, called the variable alphabet. 


(4) Another alphabet, which consists of six punctuation 
symbols, the implication sign, conjunction sign, 
tuple sign, delimiter sign, left bracket sign, and 
right bracket sign. 


(5) <A finite sequence of strings that are well-formed 
productions, according to the definition given 
below. 

In a well-formed production, it is necessary to be able 
to determine the alphabet from which each symbol is drawn. 
Accordingly, I will use (a) lower case English letters (pos- 
sibly subscripted or superscripted) for variable alphabet 
symbols (b) strings of capital English letters, digits, and 


spaces, each separated by a tuple sign, for predicate alpha- 


bet symbols (c) the symbols 


> implication sign 

5 conjunction sign 

: tuple sign 

4 delimiter sign 

< left bracket sign 
> right bracket sign 


for punctuation symbols, and (a) symbols not in alphabets (2), 
(3) and (4) for object alphabet symbols. 
A well-formed term consists of a sequence of variable 


and object alphabet symbols (e.g., “atp" and "uv"). A 


15 


well-formed term tuple consists of a sequence of terms each 
separated by a tuple sign and enclosed by a left and right 
bracket sign (e.g., "<atp:uv>"). <A well-formed atomic formula 
consists of a predicate alphabet symbol followed by a term 
tuple (e.g., "ARITH EXP: VARS<a+p:uv>"). A well-formed pro- 
duction consists of (a) an atomic formula followed by the 
delimiter sign (e.g., “ARITH Goes ce (bv) a sequence of 
atomic formulas each separated by the conjunction sign and 
followed by the implication sign, another atomic formula, and 
the delimiter sign (e.g., "PRIMARY ;: VARS<p:v>, 

ARITH EXP:VARS<a:u> > ARITH EXP: VARS<atp:uv>;"). An atomic 
formula occurring before the implication sign is called a 
premise. An atomic formula following the implication sign 

or occurring alone is called a conclusion. A production con- 
taining no premises is called an atomic production. 

In the specification of written expressions in computer 
languages, it will often be necessary to include English 
letters, digits, spaces, and the punctuation symbols as men- 
bers of the object alphabet. Since predicate alphabet charac- 
ters, the implication sign, conjunction sign, and delimiter 
sign cannot occur within the brackets of a term tuple, I 
adopt the convention that these symbols can be used in a term 
tuple as object alphabet synbole. Furthermore, let the quota- 


tion marks "*" and "“" be symbols not contained in the object 


16 


alphabet. Strings containing variable alphabet symbols, the 
tuple sign, left bracket sign and right bracket sign can 
also be used as members of the object alphabet provided that 
the strings are enclosed by the quotation marks when used 
within a production. For example, consider the following 
productions: 

VAR<A> 3; 

VAR<“*x*>3 

VAR<v> + ARITH EXP: VARS<viv,>3 

VAR<v>, ARITH:VARS<a:u> + ARITH EXP: VARS<atviuv,>3 

Here, the symbols {A x + ,} enclosed in angle brackets are 
object alphabet symbols. The symbols {a v u} are variable 
alphabet symbols. 

A derivation is a string that can be obtained from a 

canonial system using the following two rules: 

(1) If ¢; is a production containing no premises, then 
the string ¢ can be derived from the canonical sys- 
ten. 

(2) If pec; is a production with premises p, and q74d; 
is an instance of this production with each variable 
in the production replaced by some object string, 
and each premise in q has been previously derived, 
then the string d can be derived from the canonic 
system, 

These rules can be applied to the previously given production 
to derive the strings 
VAR<A> VAR<x> 
ARITH EXP: VARS<A:A;> ARITH EXP: VARS<A+xt+A:A,x,A,>3 
The strings derivable from a canonical system will be inter- 


preted in the following way. A predicate will be interpreted 


17 


as the name of a set; the term tuple following a predicate 
will be interpreted as a string that is a member of the named 
set. In the above case, the set "VAR" contains two members, 


the strings "A" and "x". The set "ARITH EXP:VARS" contains 
an infinite number of members, some of which are "A:A," and 
"At+x+A:A,x,A,". Furthermore, I will follow the convention 
that each string of predicate characters separated by a tuple 
sign will be called a predicate part, and that predicates 

of degree k will consist of either one or k predicate parts. 
In the case where a predicate of degree k consists of k predi- 
cate parts (eg.,"ARITH EXP:VARS"), each predicate part of the 
predicate will be some mnemonic describing the intended in- 
terpretation of the corresponding term in the associated term 
tuple (e.g., in the atomic production "ARITH EXP:VARS 
<atp:uv>" the string "atp" is interpreted as an arithmetic 


"uv" is interpreted as the list of 


expression and the string 
variables used in the arithmetic expression). The predicate 
parts and terms occurring after the tuple sign in an atomic 
production will be called "auxiliary" predicate parts and 
"auxiliary" terms (in the above case the term "uv" is the 
auxiliary term for the auxiliary predicate part "VARS"), 

For example, next consider the following canonktal system 
specifying a set named "ARITH EXP:VARS", consisting of all 
pairs of strings such that the first element of each pair 


is an arithmetic expression in the subset of ALGOL/60, and 


the second element of each pair is a list of the variables 


18 


occurring in the arithmetic expression: * 


DIGIT<1>; 
DIGIT<2>; 
DIGIT<3>; 
VAR<A> 3 
VAR<B>; 


DIGIT<d> + PRIMARY: VARS<d:A>; 

VAR<v> > PRIMARY: VARS<viv,>; 

PRIMARY : VARS<p:v> > ARITH EXP: VARS<p:v>35 

PRIMARY: VARS<p:v>, ARITH EXP: VARS<a:u> + ARITH EXP: VARS 
<atpsuv> 3 


These productions can be interpreted: 


3.4 


The symbol "1" is a member of the set named "DIGIT". 
The symbol "2" is a member of the set named "DIGIT", 
The symbol "3" is a member of the set named "DIGIT". 
The symbol "A" is a member of the set named "VAR". 
The symbol "B" is a member of the set named "VAR". 


If "a" represents a member of the set named "DIGIT", 

then the pair of strings denoted by "d: = is a member of the 
set named “"PRIMARY:VARS". 

If "v" represents a member of the set named "VAR", 

then the pair of strings denoted by "v:v," is 4 menbder of the 
set named "PRIMARY:VARS". 

If the pair "p:v" represents a member of the 
set named "PRIMARY: VARS", 


then the pair of strings denoted by "p:v" is a member of the 


set named "ARITH EXP:VARS". 

If the pair "p:v" represents a member of the set named 
"PRIMARY : VARS" , 

and the pair "a:u" represents a member of the set named 
"ARITH EXP:VARS", 

then the pair of strings denoted by “atp:uv" 

is a member of the set named 
"ARITH EXP:VARS". 


or more informally: 


*The symbol "A" denotes the null string, i.e., if P is a 
string then 


PA = P = AP 


19 


1. The symbols "1", "2" ana "3" are digits. | 
2. The symbols "A" and "B" are variables. 


3.1 If "a" is a digit, 
then "a" is a primary with a null list of variables. 
3.2 If "v" is a variable, 
then "v" is a primary with a list "v," of variables. 
3.3 If "p" is a primary with a list of variables "v", 
then "p" is an arithmetic expression with the same list of 
variables "vy", 
3.4 If "p" is a primary with a list of variables"v" 
and "a" is an arithmetic expression with a list of 
variables "u", 
then "atp" is an arithmetic expression with a list of 


variables "uv", 


The rules for deriving strings specified by a canonical 
system can be applied to these productions to conclude that 
(a) the set named "DIGIT" sodeteee of three members, the 
symbols "1", "2" ana "3", (b) the set named "PRIMARY: VARS" 
consists of five members, the pairs of string "l:A", 

"2:A", "3:A", "A:A,", and "B:B,", and (c) the set named 
"ARITH EXP:VARS" contains an infinite number of members, 
some of which are "A:A,", "1+2:A", "A+B:A,B,", and 


"A+1+2+A+B:A,A,B,". 


Abbreviations to the Basic Notation: 


Using only the basic notation for a canonical system, a 
specification for a computer language often becomes lengthy. 
It will be convenient during the course of this dissertation 
to abbreviate some canonkal system constructions. Here, I 
introduce four simple and useful abbreviations, the first 


355 


two of which are due to Donovan. The ability of canonical 


20 


systems to define abbreviations formally will be discussed 


in Section 2.2c. 


l.a If Cy» p> see 


premises p, the productions 


e and ch are conclusions with identical 


PPC) 3 PrCos +e. PPC 


can be abbreviated 
prc 


¢c C$ 


1? 2°? eee n? 


1.b If ec and c, ere conclusions with no premises, 


1? Sox cee 
the productions 


PT 


can be abbreviated 


2. If <t,>,<t, 


members of the same set S, the atomic formulas 


>, ee. and “oe are term tuples denoting 


S<t i>, S<t,?, eels 7 S<t >? 
can be abbreviated 


S<t175<t,?> wee 9 Xt > 
3. If Py» Poor vee and P, are premises with the same 
conclusion c, the productions 
P\7C3 Po?Cs ene eo es 
can be abbreviated 
Py | Po | . o8 Ph 2 cy 


4, If a and b ara different variables, and P and R are 


predicates, the productions 


21 


P<a> + R<ar; P<a>, R<b> > R<bar; 
can be abbreviated 


P<a> + R<SEQ(a)>; 
Thus, the productions® 


(a) DIGIT<1i>; DWGIT<2>; DIGIT<3>; 

(bv) DIGIT<p> + CHAR<p>; LETTER<p> + CHAR<p>; 
MARK<p> + CHAR<p>; 

(c) DIGIT<a> + DIGIT STR<a>; DIGIT<a>, DIGIT STR<s> 
+ DIGIT STR<s@>; 


can be abbreviated 


(a) DIGIT<1>,<2>,<3>; 
(bo) DIGIT<p> | LETTER<p> | MARK<p> + CHAR<p>; 
(ec) DIGIT<d> + DIGIT STR<SEQ(a)>; 


The abbreviated productions may informally be read: 


(a) The symbols "1", "2", and "3" are digits. 

(ob) If p is a digit, or p is a letter, or p is a mark, 
then p is a character. 

(c) If @ is a digit, then a sequence of digits is a digit 
string. 


2.1b Application to Specify Syntax 


I define the syntax of a language as the set of rules 
spakdee van peteives in a language. The syntax of ALGOL/60 
has the requirement that the type of each variable used in 
program must be declared. This requirement is not handled 


by the Backus-Naur form specification of the ALGOL/60 subset 


*Productions (b) and (c) are from the canoniml system defining 
the syntax of ALGOL/60. 


22 


given previously. For example, the syntactically illegal 


string 
BEGIN INTEGER B; A:=1 END 


can be derived using this specification, This requirement 
can readily be handled with a canonical system definition of 


the subset by 


(a) specifying with each statement an auxiliary term 
specifying the list of variables used in the 
statement, 


(bd) specifying with each declaration an auxiliary term 
specifying the list of variables declared, and 


(ec) adding a premise toe the production for a legal 
program specifying that each variable occurring 
in the list in (a) must be contained in the list 
in (pb). 
The canonical system for the subset of ALGOL/60 is given 
in Appendix l.la. There the second element in the term tuple 
for a primary, arithmetic expression, statement, and decla- 
tion specify the list of variables used or declared in the 
corresponding source language string. The restrictive premise 
"IN<u:v>" (production 5) insures that each of the variables 
in the list "u" is contained in the list of declared variables 


v. For example, the following pairs of lists are members 


of the set named "IN" (productions 6) 
<A,:A,B,> <B:A,B,> <A,B,:A,B,> <A,B,A,B,:4,B,> 


Thus the string 


23 


BEGIN INTEGER A; A:1 END 


is specified by this canonical system, whereas the illegal 


string 
BEGIN INTEGER Bs; A:=21 END 


is not specified by this canonical system because the pair 


<A,:B,> is not a member of the set named "IN". 


An Abbreviation for Specifying Syntax: 


In the specification of computer languages, it will be 
frequently necessary to write productions that specify auxil- 
jary lists with a given source language construction. For 


example, consider the productions from Appendix 1l.la 


3.1 DIGIT<ad> + PRIMARY: VARS<d:A>; 


3.4 PRIMARY: VARS<p:v>, ARITH EXP: VARS<a:u> 
> ARITH EXP: VARS<a+tp:uv?; 


Here the auxiliary terms corresponding to the predicate part 
"VARS" specify the list of variables used in each construction. 
Productions like these, in which 


(a) an auxiliary term for an auxiliary predicate part 
in a conclusion is given as "A", and the auxiliary 
predicate part does not occur in a premise (e.g., 
the auxiliary term "A" for the predicate part 
"VARS" in production 3.1), or 


(bo) an auxiliary term for an auxiliary predicate part 
in a premise is a variable, and the auxiliary term 
for the same predicate part in a conclusion con- 
tains one occurrence of the variable (e.g., the 
variables "u" and "v" for the predicate part "VARS" 
in production 3.4). 


24 


occur frequently in canonioll systems for computer languages. 
It is convenient not to have to specify explicitly the auxil- 
iary terms and their predicate parts in these cases. I 


therefore introduce the following abbreviation: 


(a) If p is an auxiliary predicate part occurring only 
in the conclusion of a production, 
and the term t corresponding to p is given as null, 


then ":p" and ":t" can be deleted from the production. 


(bo) If p is an auxiliary predicate part occurring in a 

premise and a conclusion, 

and the term t corresponding to the occurrence of 
p in the premise is. given as a variable, 

and the term u corresponding to the occurrence of 
p in the conclusion contains one occurrence 
of the variable, 

and the variable does not occur elsewhere in the 
production, 

then the occurrence of ":p" and ":t" in the premise 
and the occurrence of the variable in the con= 
clusion can be deleted. 


Thus production 3.1 above can be abbreviated 


3.1 DIGIT<d> + PRIMARY: VARS<d: A>; 
3.1' DIGIT<ad> + PRIMARY<4>; (use abr a) 


and production 3.4 above can be abbreviated 


3.4 PRIMARY: VARS<p:v>, ARITH EXP: VARS<a:u> 
+> ARITH EXP: VARS<a+tp:uv>; 
3.4' PRIMARY<p>, ARITH EXP:VARS<a:u> + ARITH EXP: VARS<atp:u>; 
-{use abr bd) 
3.4" PRIMARY<p>, ARITH EXP<a> + ARITH EXP:VARS<atp:A>; 
(use abr b) 
3.4" PRIMARY<p>, ARITH EXP<a> + ARITH EXP<atp>; (use abr a) 


To obtain the unabbreviated equivalent of a production 


to which this abbreviation has been applied, one can 


25 


(a) Write down the abbreviated production. 


(bo) Write down the corresponding unabbreviated predi- 
cates used in the production. 


(c) Specify for each predicate part occurring only in 
the conclusion a corresponding null tern. 


(d) Specify for each predicate part occurring both in 
@ premise and in a conclusion a term that consists 
of a variable that does not occur elsewhere in the 
production. 
Using rule (c), the production corresponding to 


(prod 3.1') DIGIT<d> + PRIMARY<d>; 
(predicates) DIGIT PRIMARY: VARS 


can be unabbreviated 


3.1 DIGIT<d> + PRIMARY: VARS<d:,j>; 


Using rule (a), the production corresponding to 


(prod 3.4'"') PRIMARY<p>, ARITH EXP<a> +> ARITH EXP<atp>; 
(predicates) PRIMARY:VARS ARITH EXP:VARS ARITH EXP:VARS 


can be unabbreviated* 


PRIMARY:VARS<p:v>, ARITH EXP:VARS<a:u> > ARITH EXP: VARS<atp:uv>; 


To insure the unique reversibility of this abbreviation, the 
first predicate part of each different predicate must be 
different, and the order in which added variables occur within 


the conclusion must be immaterial. 


*The variables "u" and "v" added to production 3.4" need not 
be identical to those given in production 3.4. A production 
with different variables is equivalent in that each defines 
the same set of strings. 


26 


Using this and the previously given abbreviations, the 
canonical system of Appendix leila has been abbreviated into the 
canonicd system of Appendix 1.1b. The abbreviated canonical 
system can be viewed quite differently from its unabbreviated 
equivalent. For example, consider the abbreviated productions 


3.2" VAR<v> + PRIMARY: VARS<viv,>3 
3.3' PRIMARY<p> > ARITH EXP<p>; 


and their unabbreviated equivalents 


3.2 VAR<v> - >» PRIMARY: VARS<viv,>3 

3.3 PRIMARY:VARS<p:v> + ARITH EXP:VARS<p:v>3 

In production 3.2, a new auxiliary term "vy," is specified for 
the auxiliary predicate part "VARS" and this auxiliary predi- 
cate and term are specified in the abbreviated production 
3.2', In production 3.3, however, the auxiliary list of 
variables is carried unchanged from the premise to the con-~ 
clusion, and this list is not specified in the abbreviated 
production 3.3'. 

Furthermore, consider the production 


De STM: VARS<s:u>, DEC:DEC VARS<d:v>, IN<u:v> 
+ PROGRAM<BEGIN a; s END>; 


"u" and "v" are con- 


Here the auxiliary lists of variables 
strained by the premise "IN<u:v>", and hence the auxiliary 
predicate parts and terms for these lists occur in both the 


abbreviated and unabbreviated productions. 


27 


Thus the auxiliary terms referring to the lists of vari- 
ables and their associated auxiliary predicate parts are explicitly 
specified only when a new variable is added to the list (produc- 
tions 3.2, 3.5 and 4.2) or when the list is required to have 
certain properties (production 5.). In languages like 
SNOBOL/1 and ALGOL/60, where the number of auxiliary terms is 
large, the abbreviation just given markedly reduced the size 


of their canoniml) systems specifying syntax. 


2.lc Application to Specify Translation 


I define the translation of a language as the function 
mapping the strings in the language into strings in sous 
other language. This function can be specified by a canonical 
system specifying a set of pairs of strings, where the first 
element in each pair is a legal string in the source language, 
and the second element.is 8 corresponding string in the 
target language. 

As in the previous section, I will illustrate this use 
of canonkal systems by example. The specification of the syn- 
tax of the ALGOL/60 subset has been modified to specify not 
only the legal strings in the subset but also their trans- 
lation into IBM System/360 assembler language. This specifi- 
cation is given in Appendix 1.2a. There the term to the left 
of each ".." specifies some string in the ALGOL/60 subset, 
the term to the right of each ".." specifies the representa- 


tion of the string in the target language. For example, 


28 


2 Vhs “agit Sotiy Dooecies Bao 0) oafsonets conus 


the following pair of strings is a member of the set named 


"PROGRAM": 


BEGIN INTEGER A; A:=1 END..*ASSEMBLER LANGUAGE PROGRAM 


BALR 15,0 SET BASE REGISTER 
USING *,15 #INFORM ASSEMBLER 
L 1,=F'l' *LOAD 1 
ST 1,A *STORE RESULT IN A 
svc 0 *RETURN TO SUPERVISOR 
“STORAGE FOR VARIABLES 
A DS F 
END. 


Note that this canonktal system includes the specification of 
the comment entries in the assembler statements so that (hope- 
fully) the reader will not have to be familiar with the assembler 


language to understand the translation. 


An Abbreviation for Specifying Translation: 


Except for the specification of strings in assembler 
language, the canonical system defining the translation of the 
subset is identical to the canonial system defining the syntax 
of the subset. In general, since a definition of the syntax 
of a language specifies the legal strings in a language and 
a definition of the translation of a language specifies the 
legal strings as well as their representation in some other 
language, the definition of the translation of a language will 
encompass the definition of the syntax of a language. This 
Similarity leads to the following abbreviation. 

Let numbers be placed on the productions of the canonical 


systems for the syntax and translation so that a production 


29 


eam eS COT eg, Gt ES Tg Mt ee eye BY ete ge) SS oe 
tee ee cee ee Dene ir a 


specifying the translation of a string is given the same 
number as the corresponding production specifying the syntax 
of the string. Let Ps and Py be identically numbered produc- 
tions from the canonical systems specifying respectively the 
syntax and translation. 

(a) If Pp, and p, are identical, then P, can be omitted. 


(bo) If a premise in p_ and p, are identical, then the 
premise in P, can be omitted. 


(c) If an auxiliary predicate part and corresponding 
term of atomic formulas with identical first predi- 
cate parts in p_ and p, are identical, then the 
auxiliary predicate part and term in P, can be 
omitted. 

For example consider the production from the syntax of 

the ALGOL/60 subset 
5. STM:VARS<s:u>, DEC:DEC VARS<d:v>, IN<u:v> 

+ PROGRAM<BEGIN da; s END>; 
and the corresponding production from the translation of the 
subset 
9+' STM:VARS<s..s':u>, DEC:DEC VARS<d..d':v>,  IN<u:v> 

+ PROGRAM<BEGIN d; s END..a>; 
where a represents the string that specifies the translation 
of the program. Here, using rule (b), the premise "IN<u:y>" 
can be omitted from the translation production, and using 


rule (c) the auxiliary predicate parts and terms for the 


Ww "w 


lists "u" and "v" of variables can be omitted to yield the 


abbreviated production for the translation 


30 


5." STM<s..s'>, DEC<d..d'> + PROGRAM<BEGIN d; s END..a>; 


To obtain the unabbreviated equivalent-of an abbreviated 
canonical system defining translation, one must add to the 
canonical system defining translation (a) the numbered pro- 
ductions that occur in the canonical system for the syntax 
but do not occur in the canonical system for translation (bd) 
the premises that occur in a production for syntax but do not 
occur in the identically numbered productions for translation, 
and (c) for atomic formulas with identical first predicate 
parts, the auxiliary predicate parts and corresponding terms 
that occur in a production for syntax but do not occur in the 
identically numbered production for the translation, 

For example, consider the abbreviated translation pro- 


duction just given 
5.'' STM<s..s'>, DEC<d..d'> + PROGRAM<BEGIN 4d; s END..a>; 
and the corresponding production for the syntax . 


5. STM: VARS<s:u>, DEC:DEC VARS<d:v>, IN<u:v> 

> PROGRAM<BEGIN da; s END>; 
Here, the premise "IN<u:v>" occurs in the production for the 
syntax but not in the production for the translation, and the 
auxiliary predicate parts and corresponding terms for the pre- 
dicate parts "VARS" and "DEC VARS" occur in the produdtion 
for the syntax but not in the production for the translation. 


Adding this premise and these auxiliary predicate parts and their 


31 


terms to the abbreviated production 5." for the translation, 
we obtain the unabbreviated production 
5.' STM: VARS<s..s':u>, DEC:DEC VARS<d..d':v>, IN<u:v> 

+ PROGRAM<BEGIN d; s END..a>; 

The abbreviated canonical system specifying the transla- 
tion of the ALGOL/60 subset is given in Appendix 2.1b. The 
abbreviated canonical system of Appendix 2.1b can be viewed 
quite differently from its unabbreviated equivalent. The 
abbreviated canonical need specify only the new terms that 
must be added to the canonical system specifying the syntax 
in order to convert the canonical system specifying syntax 
into the canonical system specifying translation. In writing 
the abbreviated canonical system specifying translation, the 
requirements needed to insure the syntactic legality of a 
string whose translation is being specified can be omitted. 
These requirements are assumed ito have been specified in 
the canonical system for the syntax. In languages like 
SNOBOL/1 and ALGOL/60, where the number of syntactic require- 
ments is large, this abbreviation greatly reduced the size 
of the canonical systems: defining the translations of the 


languages into the‘target language. 


2.2 Defining Canonical Systems 


2.2a The Notion of a.Defining Canonical System 


The previous sections have been devoted to developing 


32 


canonical systems specifying sets of strings. The strings 
represented syntactically legal programs in a subset of ALGOL/60 
and their counterparts in assembler language. The rules for 
forming and using the canonical systems for these sets were 
described informally in the text in English. The string repre- 
senting a canonical system and the rules for using the canoni- 
cal system can, in turn, be specified formally by another 
canonical system. In cases where a conflict would arise in 
distinguishing the strings of the first canonical system in 

the productions of the defining canonical system, the strings 
of the first canonical system can be enclosed by the quotation 
marks "*" ana "°", 

The productions specifying the rules for constructing 
another canonical system are given in Appendix 1.3a. These 
productions specify the alphabets of object symbols, predicate 
symbols, and variable symbols, and the rules for constructing 
well-formed terms, term tuples, atomic formulas, premises, 
conclusions, productions, and finally, canonical systems.*® 

The logical notion of using a second canonical system 


to formalize the rules for constructing a canonical system 


*In the productions of Appendix 1.3, the quotation marks have 
been omitted for matching pairs of left and right brackets 

that occur as object symbols. For example, in the atomic 
formula "WF TERM TUPLE<<t>>", quotation marks have been omitted 
from the second and third brackets. In atomic formulas of 

this type, the scope of the left bracket sign extends to the 
matching right bracket sign, and all brackets thus enclosed 

are considered as object symbols. 


33 


was first presented by Smullyan* and later by Donavan. > in 

the works presented by Smullyan and Donavan, a notation dif- 
ferent from the basic notation is used in a defining canonical 
system. The advantages of using quotation marks to distinguish 
symbols in the defined canonical system from symbols in the 
defining canonical system are that (a) the same notation is 
used for all canonical systems, and (b) definitions and rules 
formalized in one canonical system can be copied and applied 

to other canonical systems independently of their position 

in a series of defined and defining canonical systems (this 


point will be discussed in section 2.2c). 


2.2b Application to Derive Syntactically Legal Programs 


The rules for deriving strings specified by a canonical 
system can also be formalized with a defining canonical system. 
These rules are given in Appendix 1.3b. By adding a production 
of the form "CANONTAL, SYSTEM STR<c>;", where c is some well- 
formed canonical system, these productions define the rules 
for deriving strings in the canonical system c. 

In particular, productions 9 specify the rules for 
extracting productions from the member of the set "CANONICAL 
SYSTEM STR". Production 10 specifies the rule for substitut- 
ing strings in the odject sipbabat. dx place of the variables 
in the productions to obtain instances of the productions. 
Productions 11 specify the rules for deriving strings specified 


by the production instances. 


34 


Productions 10 and 11 can be viewed as a formalization 
of the two logical rues of inference "substitution" and "modus 
ponens" for deriving strings specified by a canonical system, 
The substitution of object strings for variables in. a produc-— 
tion occurs through the predicate "SUBST". The predicate 
"SUBST" define a set of 4-tuples, where the first element of 
each 4-tuple is a production, the second element is a variable, 
the third element seus stake of object alphabet symbols, and 
the fourth element the production with each occurrence of the 
variable replaced by the object string. For example, using 
the canonical system of the syntax of the ALGOL/60 subset as 
a member of the set "“CANONZALSYSTEN STR", the following h- 


tuple can be generated as a member of the set "SUBST" 
<DIGIT<d>+PRIMARY:VARS<d:A> : dad: 1 : DIGIT<1>+PRIMARY : VARS<1:A >> 


The application of modus ponens to the production instances 


of a canonical system occurs in production i1.1. 


11.1 DERIVATION<A>; 

11.2 DERIVATION<a>, PROD INSTANCE<c;>, WF CONCLUSION<c> 
+> DERIVATION<d c>; 

11.3 DERIVATION<d>, PROD INSTANCE<p>c;>, 
PREMS:DERIV CONT PREMS<p:d> + DERIVATION<d c>; 


These productions can be read: 


11.1 From no premises, the null string can be derived. 
11.2 If the string d has been derived, 
and c; is an instance of a production that contains no 
premises, 
then the string c can be added to the string d. 


35 


11.3 If the string d has been derived, 
and p*c; is an instance*of a production with premises p, 
and the premises p are contained in the string 4d, 
then the string c can be added to the string d. 
For example, by successively using the following production 
instances 
‘DIGIT<1>; 


DIGIT<1> + PRIMARY: VARS<1:A>; 
‘PRIMARY: VARS<1:A> + ARITH EXP: VARS<1:A>; 


the following member of the set "DERIVATION" can be generated 
DIGIT<1> PRIMARY: VARS<1l:A> ARITH EXP: VARS<1: A> 


Another example of a member of the set "DERIVATION" is 
generated in the right-hand column of Appendix l.4a. By simply 
asserting that the canonical system defining the syntax of the 
ALGOL/60 subset is a member of the set "CANONIAL SYSTEM STR" 
(i.e., by simply adding the production “CANONIOL SYSTEM STR 
<“DIGIT<1>; ... %IN<y:2> + IN<xy:%>3%>3;" to the productions 
of Appendices 1.3a and 1.3b), Appendix 1.3 defines the rules 
for deriving syntactically legal programs in the ALGOL/60 
subset. The derivation of Appendix l.4a specifies that the 


string BEGIN INTEGER A; A:=1 END 


is a member of the set "PROGRAM". 
Yet another example of a member of the set "DERIVATION" 


is generated in the right-hand column of Appendix 1.4b. By 


Win instance of a production P is the production P' obtained 
from P by applying substitution to all of the variables in a 


Production, 


36 


asserting that the canonical system defining the translation 


of the ALGOL/60 subset is a member of the set "CANONICAL SYSTEM 
STR", Appendix 1.3 defines the rules for deriving syntactically 
legal programs and their translation. The derivation of 


Appendix 1.4b specifies that the string 


BEGIN INTEGER A; A:=1 END..*ASSEMBLER LANGUAGE PROGRAM 


BALR 15,0 *SET BASE REGISTER 
USING *,15 *INFORM ASSEMBLER 
L Li=F'2' LOAD 1 
ST 1,A *STORE RESULT IN A 
svc 0 *RETURN TO SUPERVISOF 
“STORAGE FOR VARIABLES ~ 
A DS F 
END 


is a member of the set "PROGRAM". 

Thus by simply adding a production asserting that some 
well-formed canonical system is a member of the set "CANONICAL 
SYSTEM STR", the productions of Appendix 1.3 can be used to 


generate all strings defined by the canoniral systen. 


Structural Description of Derived Strings:* 


A derivation provides a "structural description" of a 


derived string. By a structural description? 


of a string, 
I mean the sequence of rules (here the sequence of productions) 
used in generating the string. The sequence of rules used in 


generating a string provides information about the structure 


of the string. 


*This application is not used in the other sections of this 
dissertation. 


37 


For example, consider the derivation of Appendix l.4a. 
If we consider only the first term of each derived term tuple, 
the derivation provides a structural description for the string 
"BEGIN INTEGER A; A:=1 END" that may. be represented in the 


form of a syntactic tree: 


PROGRAM 
BEGIN DEC . STM END 
INTEGER TYPE LIST VAR $= ARITH EXP 
A A PRIMARY 
DIGIT 
1 


The tree can be constructed by scanning the derivation 
from bottom to top and constructing the corresponding tree 
from the top down. The leaves of the tree are symbols from 
the object alphabet. The nodes of the tree are the partial 
predicate names occurring in derived conclusions. The branches 
joining a node are determined by the basic symbols and the 
previously derived conclusions used to construct the newly 


derived conclusion. 


Using a canonical system for the translation of a language, 
a derivation can be used to construct a structural description 
of a target language string. The System/360 assembler language 
is not a "structured" language and hence the derivation of an 
assembler language program is not of concern. However, canon- 
ileal systems have been ueea" to obtain structural descriptions 
of strings in a target language where knowledge of a string's 


tree-like structure is important for its analysis.* 


2.2c Application to Specify Notational Abbreviations 


I define an abbreviation as a bijective (one-to-one and 
onto) function mapping one set of strings (the unabbreviated 
strings) into another set of strings (the abbreviated 
strings). The bijectiveness of the function insures that we 
can recover the unabbreviated equivalent of each abbreviated 
string. I have introduced six abbreviations to the notation 
for canonical systems, four to the basic notation, one for a 
canonical system specifying syntax, and another for a canoni- 
cal system specifying translation. Each of these abbrevia- 
tions can be specified by a defining canonical system speci- 
fying a set of ordered pairs, where the first element of 
each pair is an abbreviated canonical system, and the second 


element is the corresponding unabbreviated canonial system. 


*A canonical system derivation can lead to much more compli- 
cated structural descriptions than those that can be repre- 
sented in tree-like form, I have not studied this issue. 


39 


The productions specifying the six abbreviations intro- 
duced to canonical systems are given in Appendix 1.3c. For 


example, productions 15.1 and 15.2 in 


15.1 WF PROD<p-c;> > ABR1 P:P<p>c3:p7c3>3 
15.2 WF PROD<p>c3>, ABR1 P:P<p>s;:t> + ABR1 P:P<p>c,s3:p7c;3t>3 
15.3 WF ATOM PROD<c;3> + ABR1 AP:AP<c3:c¢3>3 


15.4 WF ATOM PROD<c3;>, ABR1 AP:AP<s3:t;> 
> ABR1 AP: AP<s,c3:t3c3>3 
15.5 ABR1 CS:CS<A;A>3 
15.6 ABR1 CS:CS<c:d>, ABR1 P:P<p:q> > ABR1 CS:CS8<cp:dq>;5 
15.7 ABR1 CS:CS<c:d>, ABR1 AP:AP<p:q> + ABR] CS:CS<cp:dq>; 


specify a set of ordered pairs "ABR1 P:P", where the first 


element is a production of the form"p+c C3 


y> Soe cee 9 Cp 
the second element is the corresponding unabbreviated pro- 


ductions "prey; pre pre 3". Productions 15.3 and 15.4 


aise 
augment this set to include atomic productions, and produc- 
tions 15.5 through 15.7 specify the abbreviation for an entire 
canonical system. 


Similarly, productions 16 through 20 specify the other 


five abbreviations to canonical systems.* Productions 21] and 


*To apply abbreviation 20, the abbreviation for a canonical 
system pee ae syntax, a production of the form "CS PREDI- 
CATES<p pees o Py >" where the p l<i<n, are the unabbre- 
viated oe icate for the canonical lysten, must be added to 
productions 20, 


To apply abbreviation 21, the abbreviation for a canonical 
system specifying translation, (a) the productions and pre- 
mises occurring in the canonical system for syntax but not in 
the canonical system for translation must be added to the 
canonical system for translation, and (b) atomic formulas with 
identical first predicate parts from identically numbered 
productions from the canonical systems for the syntax and 
translation must be written together in the panonlces system 
for translation and separated by "//". 


40 


Sh nek SE PE pi ms eR ay peng got 


22 specify abbreviations used in defining ALGOL/60 and will 
be discussed in the chapter on ALGOL/60. Finally, production 
23 specifies the rule for converting some string (presumably 
a well-formed abbreviated canonical system) that is asserted 
to be a member of the set "ABR CANONICALSYSTEM STR" into the 
corresponding member of the set “CANONTA SYSTEM STR" (the un- 
abbreviated equivalent of'‘the abbreviated canonical system).* 
For example, by asserting that the abbreviated canonical 
system of Appendix 1.1b is an abbreviated canonical system 
(i.e., by adding the production asserting that the canonical 
system of Appendix 1.1b is -a member of the set "ABR CANONICAL 
SYSTEM STR"), the productions of Appendix 1.3c can be used to 
derive the conclusion that the canonical system of Appendix 
l.la is its corresponding unabbreviated equivalent (i.e., the 
canonical system of Appendix l.la is a member of the set 
"CANONICALSYSTEM STR"). Similarly, by asserting that the 
canonical system of Appendix 1.2b is a member of the set "ABR 
CANONIGL SYSTEM STR", production 24. can be used to derive the 
conclusion that the canonical system of Appendix 1.2a is its 


unabbreviatedequivalent.** In general, by 


*The order in which abbreviations are removed from an abbre- 
viated canonical system will generally depend on the abbrevia- 
tions introduced. Production 23. defines one order in which 
the abbreviations introduced in this dissertation can be 
removed. Furthermore, any premise in production 23 that 
refers to an abbreviation not used in a particular abbreviated 
canonical system can be removed. 


**As mentioned previously, an atomic production specifying the 
unabbreviated predicates of an abbreviated canonical system 
specifying syntax must be added to the defining canonical 
system to generate the correct unabbreviated (cont. next page) 


41 


(a) specifying the sets of ordered pairs defining 
some abbreviations, and 


(bd) adding a production like production 23 defining 
the rule for converting an abbreviated canonical 
system into its unabbreviated equivalent. 

a defining canonical system can be used to generate the un- 
abbreviated equivalent of any abbreviated canonical system. 
Moreover, having generated the euuivalent unabbreviated 
canonical system, the productions of Appendix 1.3a and 1.3b 
can then be used to derive strings specified by the canoni- 
cal system. 

The productions of. Appendix 1.3 are written using only 
the first two abbreviations to the basic notation. To define 
Appendix 1.3 using only the basic notation, the user could 
write a third canonical system, which would consist of simply 
(a) a production asserting that the canonical system of Appen- 
dix 1.3 is a member of the set “ABR CANONICAL SYSTEM STR", 

(bo) productions 15 and 16 of Appendix 1.3 (these productions 
contain no abbreviations), and (c) the production "ABR CANONICAL 
SYSTEM STR<a>, ABR2 CS:CS<a:b>, ABR1 CS:CS<b:c> > CANONICAL 
SYSTEM STR<c>;3". The user would then have a series of three 
canonical systems. The first (abbreviated) canonical 

system (e.g., Appendices 1.1b or 1.2b) would define the allow- 


able strings in some source language. The 


*#*(Cont. from p. 41) canonical system, and the productions 
of the abbreviated canonical systems specifying syntax and 
translation must be combined (according to the rules given 
earlier) to generate the complete unabbreviated canonical 
system specifying translation. 


42 


second canonical system would define the rules for forming 
the first canonical system, the rules for deriving strings 
specified by the first canonical system, and the rules for 
converting the first canonical system into the basic notation. 
The third canonical system would define the rules for convert- 
ing the second canonical system into the basic notation. 
Thus, the series of canonical systems would ultimately be 
defined using only the basic notation, In general, a user 
may write a series of canonical systems to define the rules 
for constructing and using other canonical systems; in order 
for the series to be definea using only the basic canonical 
system notation , only the last member of the series need be 
written in the basic notation. 

Note that productions 15 and 16 of Appendix 1.3 could 


be copied unchanged in the third canonical system. These 


productions formalize rules that are applicable to two 
canonical systems independently of their relative positions 
in a series of canonical systems. In fact, these productions 
can be copied and applied to the canonical system in which 


they themselves are given. 


User-Coined Abbreviations: 


Defining canonical systems provides a writer of a canoni- 
cal system with a formals mechanism for introducing his own 
abbreviations to the notation. For example, consider the prod- 


uctions (from the canonical system of ALGOL/60): 


43 


Re ee a te, eh ae a ae EI ANS a fe eae 
Oa A Rerercgalng nme ete a ne ee a” Mi chs ae. tad 
- . 


PRIMARY <p> + TERM<p>; 
PRIMARY<p>, MULT OP<m>, TERM<t> + TERM<tmp>; 


The user may wish to abbreviate these productions: 
PRIMARY<p>, MULT OP<m> + TERM<ALTSEQ(p m)>; 


Productions 21 of Appendix 1.3c specify this abbreviation (as 
well as other variants of this abbreviation). Thus by simply 
adding new productions to the canonical system defining the 
conversion of a abbreviated canonical system to unabbreviated 
form, the notation for canonical systems can be tailored to 


fit a particular application. 


2.3 Discussion 


Canonical systems lave placed under a single framework 
the complete definition of the syntax and translation of a 
language. The formalism was used to specify all legal pro- 
grams, their translations into assembler language, the rules 
for deriving legal programs and their translations, and the 
rules for removing abbreviations from the specifications. 
Not once was it necessary to introduce concepts outside 
canonical systems;although some complexity was added to the 
formalism by introducing abbreviations to the basic notation, 
even the abbreviations were ultimately defined in terms of 
the basic formalisnm. | 

It is important to develop languages whose descriptions 


are concise. The Backus-Naur form specification of the ALGOL/60 


44 


subset and the English sentence describing the context-sensi- 
tive requirement provide one very concise and easily under- 
standable description of the syntax of the subset. The 
canonical system of Appendix 1.1 has, in fact, been modeled 
after this description. Productions 1 through 5 correspond 
(except for the auxiliary elements generating the lists of 
used and declared variables) to the Backus-Naur form produc- 
tions; the premise "IN<u:v>" in production 5 and the defini- 
tion of the predicate "IN" formalize the context sensitive 
restriction stated in English. 

The canonical system of Appendix 1.1 is not much more 
lengthy than the Backus-Naur form definition of the subset 
ana the associated English sentence describing the context- 
sensitive restriction. Like Backus-Naur form, the language 
of canonical systems is readable. On the other hand, canoni- 
cal systems have the added power to characterize completely 
both the syntax of a language and its translation into a 
target language, without resorting to the English Language. 
Moreover, the notation for canonicalsystems is not fixed. 

By changing or adding productions to a defining canonical 
system, the user can alter or abbreviate the notation for a 
defined canonical system to fit a particular language. 

I wish to point out two additional features of the 
canonical systems of Appendices 1.1 and 1.2. First, barring 
any inadvertent errors, the canonical systems degeritece set 


of ALGOL/60 programs and assembler language programs that 


45 


will run on a computer when translated by an ALGOL/60 compiler 
or System/360 assembler. Second, the specification of the 
comments entries in the assembler language statements was 
provided not only to aid the reader. The comments are meaning- 
ful context-sensitive strings in the English language. The 
specification of these strings was handled as easily as the 
specification of the strings in assembler language. The 
specification of the strings in the English language illus- 
trates the use of canonical systems to specify the entire 
operation of a translator, including the specification of 
meaningful comments. Moreover, it suggests the capacity of 
canonical systems to define string transformations in lan- 
guages other than computer programming languages. 

One use of canonical systems is in the development of a 
generalized translator for computer languages, i.e., a trans- 
lator that is independent of both source and target languages. 
Canonical systems define a set by specifying rules for 
generating its members. To use a canonical system as a lan- 
guage for writing translators, an algorithm to recognize 
strings specified by a canonical system and output associated 
strings is needed. No algorithm for recognizing and construct- 
ing strings specified by a canonical system is presented in 
this dissertation. However, one algorithm for canonical 
systems has been devised and implemented by Aieon?? 

Several important issues for using canonical systems in 


@ generalized translator have not been studied. One critical 


46 


issue is the development of a restriction on canonical 
systems to define only recursive sets rather than recursively 
enumerable sets. Theoretically, an algorithm for recognizing 
a string defined by a canonical system exists only if the set 
of strings defined by the canonical system is recursive. 
Other critical issues include speed of translation, recovery 
in case of an error in a source language program, and code 
optimization of target language programs. I expect that 
modifications to the basic formalism presented here will be 
necessary to use canonical systems in a generalized trans- 
lator. 

The notion of defining canonical systems unfolds several 
possibilities for using canonical system as a tool for working 
with computer languages. Just as a canonical system allows 
@ user to change a source or target language construction by 
simply changing the productions specifying the construction, 
a defining canonical system allows the user to change the 
definition or use of a defined canonical system by simply 
changing productions of the defining canonical system, Al- 
though only rules for removing abbreviations from a canonical 
system and rules for deriving strings specified by a canoni- 
cal system have been defined here, defining canonical systems 
may provide a flexible mechanism for embedding many other 
rules for defining and manipulating computer languages. 

As mentioned earlier, the results of this chapter apply 


to any recursively enumerable set. Any function or relation 


47 


that is recursively enumerable can be specified by a canoni- 
cal system. Canonical systems can be used to express algo- 
rithms and string transformations of a much different nature 
from those given here. The notion of defining canonical 
systems adds to the basic formalism a facility for allowing 
a user to formalize his own rules for defining and manipulat~ 
ing strings and their canonical systems. The modifications 
to the basic formalism presented here have been directed 
towards the application of canonical systems to define the 
syntax and translation of a language. But more importantly, 
canonical systems provides a definitional facility that the 
user has the freedom to tailor according to his own applica- 


tion and style. 


48 


-GHAPTER III 


EXTENDED MARKOV ALGORITHMS AND A-CALCULUS: 
A COMBINED FORMALISM USED AS THE BASIS 
FOR A TARGET LANGUAGE FOR DEFINING SEMANTICS 


This chapter presents a formal language (henceforth 
referred to as the target language) quite different from con- 
ventional machine or assembler language for defining the 
semantics of a computer language. 

| The semantics of a language can be defined as the set of 
rules relating the strings in a language to the behavior or 
objects that the strings denote. The behavior or object that 
a string denotes can be described by a string in some other 
language whose meaning is presumably understood. This approach 
to defining the semantics of computer languages will be taken 
in this chapter, namely, the presentation of a single language 
(whose meaning is presumably understood) for defining the 
semantics of multiple other languages. The semantics of a 
given source language will be specified by defining the trans~ 
lation of the language into the target language. 

The semantics of the target language, however, will not 
be left to an English language explanation in the text. The 
semantics of the target language will be further explicated 
in Section 3.2 by giving a formal definition of a machine*® 


that performs the computation indicated by a target language 


*"Machine" in the sense of a set of logical rules. 


49 


string and produces the string denoted by the target language 
string. (In defining the semantics of a computer language, 
the word computation can be considered synonymous with the 
word "behavior" and all "objects" in a computer deneuane can 
be considered as strings.) Thus the appeal to understanding 
the semantics of a computer language will be ultimately re- 
duced to understanding the formalism in which the operation of 
the target language evaluating Wachee is expressed. 

Generally, the semantics of different languages will be 
specified by giving different translations into the target 
language while leaving the definition of the target language 
evaluating mechanism unchanged... On the other hand, the defini- 
tion of the evaluating mechanism can be changed to define 
source language constructs that appear difficult to define in 
the target language. * 

The target language presented here is based on the 


9 


formalism of Markov algorithms,” an extension to Markov algo- 


10,11,12 


rithms due to Caracciolo, and the formalism of the 


17,18 Extended Markov algorithms 


A-caleulus of Alonzo Church. 
are used to define the primitive functions in a computer 
language, the A-calculus is used to define new functions from 
the primitive functions. In a sense, the target language 


draws upon the best of each formalism. Markov algorithms 


explicate the notion of an algorithm operating on a string 


*This was done to define indirect addressing in SNOBOL/1. 


50 


and are especially well-suited to the definition of primitive 
functions transforming strings into new strings. The A- 
calculus explicates the notion of a function and is especially 
well-suited to the definition of new functions from the primi- 
tive functions. 

The target language has several important properties. 
The language is formally based, and theorems regarding the 
completeness of the formalisms to define the set of all "com- 
putable" function exist. 31°37 the language is independent of 
the characteristics of existing computers. The basic notation 
for the target language is simple. Probably most importantly, 
the correspondence between many computer languages and the 
target language is somewhat simpler than the correspondence 
between computer languages and conventional machine or 


assembler languages. 


3.1 The Target Language 


3.la Extended Markov Algorithms 

Markov Algorithms: 

Let A be an alphabet of characters, called the object 
alphabet, and let "+", "+" and "A" be characters not in A. 
A Markov algorithm is a finite list of substitution rules of 


the form 


51 


where the 8, and t,, l<si<n, are either "A" or strings of 


i? 
object alphabet characters, and "(+)" indicates the possible 
occurrence of a "+" after the "+", The symbol "A" denotes 
the null string. 

A Markov algorithm of the above form when applied to an 
object string X is taken to mean: 


(a) Look down among the substitution rules for the 
first rule such that 8, occurs in X. 

(ob) If such a rule is found, replace the leftmost occur- 
rence of sy in X by the string t.. If a "*" occurs 
after the "+" in the substitutiofi rule, terminate 
the algorithm. Otherwise repeat the application of 
the algorithm to the newly formed string. 


(c) If no such rule is found, terminate the algorithm. 


For example, the Markov algorithm 


B > D 
Cc > F 
oo. )«€6©@f 


transforms the string "COBBLER" into the string "FIDDLER", 


‘ 


whereas the Markov algorithm 


Bo? D 
C +e «6T 
Oo 7+ I 


transforms the string "COBBLER" into the string "TODDLER". 
Consider the following Markov algorithm for taking a 
parenthesized string of letters from the alphabet {1,0,N,X} 
and producing a string where the initial letters are reversed. 
(Here the character "#" is used as a marker, and the object 


alphabet consists of the characters {I 0 N X ( ) «}.) 


52 


Ii# > Ix] 
TOR > Ox 
IN® > Ne 
TX# > X#¥I 
Or* > I*O 
OOx > O#0 
ON*® > NRO 
OX* > X#O 
NIx > I#N 
NOx > OxN 
NNX > Nen 
NX* > X#¥N 
XT% > THX 
XO# > OxX 
XN > N*¥X 
XX > X¥X 
(Ts > nen 

(O# > ont 

(Nx > N( 

(X# + x 

() aa i 


A Markov algorithm for reversing a paranthesized 
string of letters {I 0 N X} 


53 


— se S3 ete ek ee i ates i eR alls RRL oO Beir Aol Ti, ads aC Sk RY EM A eG AD Pa i 32 ig De er pe 


This algorithm when applied to the string "(NOXIN)" 


successively transforms it into the following strings 


(NOXIN) + (NOXIN*) + (NOXN®I) + (NON*XI) + (NN*OXI) 
> (N*NOXI) + N(NOXI) + N(NOXI*) + N(NOI*X) 
+ N(NI*OX) + N(I*NOX) + NI(NOX) + NI(NOX*) 
> NI(NX*O) + NI(X*NO) + NIX(NO) + NIX(NO#*) 
+ NIX(O*N) + NIXO(N) + NIXO(N*) + NIXON() 
++ NIXON 


Even quite simple algorithms like the above become exceed- 
ingly lengthy when expressed in the Markov formalism. If the 
alphabet above included all 26 letters in the English alphabet, 
the Markov algorithm for reversing the letters in a string 
would require 704 substitution rules. To alleviate this 


es a ae in developing a Markov 


growth, Caracciolo di Forino 
algorithm based language called PANON introduced the notion 


of a "string variable” as an extension to Markov algorithms. 


Extended Markov Algorithms: 

Let A and V be disjoint alphabets of characters, called 
respectively the object alphabet and variable alphabet, and 
let "5", "." and wan be characters not in A or V. Let each 
variable in V represent some pre-specified (possibly infinite) 
set of object alphabet strings. The case where different 
variables can represent different sets of object alphabet 


strings is not excluded. An extended Markov algorithm is a 


finite sequence of substitution rules of the 


04 


where the s, and t,, lsixn, are either "A" or strings of object 
alphabet and variable alphabet characters such that each vari- 
able in t; occurs also in S.- 

A string 85 represents the set of object alphabet 
strings computed by concatenating in order from left to right 
each of the object alphabet characters in Ss. with any object 
alphabet string represented by a variable in S,- The set repre- 
sented by Ss, is constrained in that each occurrence of the 
same variable in s, must be set to the same object alphabet 
string in computing the set of segues veNaKearanieet strings 
that S, represents. For example, if £ is a string variable 
representing any member of the set {V W} and m is a string 
variable representing any member of the set {Y ZZ} the string 
"gAmA2" represents any member of the set {VAYAV VAZZAV WAYAW 
WAZZAW}. 

A string Ss; is said to occur within an object string X 
if one or more of the strings represented by Ss; occurs within 
X. The "leftmost" occurrence of S5 in X is the string such 
that first, (of the occurrences of s in X) the occurrence 
begins with the leftmost object alphabet character, and second, 
the occurrence is as short as possible. 

An extended Markov algorithm of the above form when ap- 


plied to an object string X is taken to mean: 


55 


(a) Look down among the substitution rules for the first 
rule in which S, occurs in X. 


(bd) If such a rule is found, replace the leftmost oc- 
currence of sy in X by the string obtained from ty; 
by replacing each variable in ty by the string 
used in place of the variable in sy. Ifa "+" 
occurs after the "+" in the substitution rule, 
terminate the algorithm. Otherwise repeat the ap- 
plication to the newly formed string. 

(c) If no such rule is found, terminate the algorithm. * 

It will be convenient to introduce a special symbol after the 
8. to mean that the string matched to 8 must extend to the 
last character of the object string. I will use the symbol 
"." for this purpose. ** 

For example, let s and s' be string variables represent- 


ing any string of English letters. The extended Markov 


algorithm 
(1) sI + 380 


transforms the string "BINGO" into the string "BONGO", the 


extended Markov algorithm 


(2) XsXs'X > ss! 


*The transformation specified by a substitution rule of an 
extended Markov algorithm is computable only if the string 
variables represent recursive sets. This requirement is 
discussed in detail by Caracciole (Chap. 5, ref. 11). In 
this dissertation all sets defined for string variables are 
recursive. 


**This convention can be viewed solely within the framework of 
extended Markov algorithms by (a) replacing each "+" after 
the sy by a special character not in the object alphabet (b) 
replacing each corresponding t,; with t, followed by the spe- 
cial character (c) appending ta each object string X the 
special character, and (a) applying to the transformed object 
string an algorithm that simply removes the special character. 


56 


transforms the string "XABXCDX" into the string "ABCD", the 


extended Markov algorithm 
(3) sXs > X 


transforms the string "QABXAB" into the string "QX", and 
the extended Markov algorithm 
(4) Xs. > A 
sx 7s X 
transforms the string "?VWXX?XBC" into the string "?Xx7".* 

More precisely, an extended Markov algorithm will be 

specified in three parts: 

(a) <A statement listing some string variables and the 
names of the sets whose members the variables 
represent. 

(vo) A formal. definition of the sets named in (a). 

(c) A list of extended Markov algorithm substitution 
rules including possible occurrences of the de- 
fined string variables, 

I will use statements of the form " | a, .85,+.-8,€A | by Poses 
bes [in -aeebe |p) »Pos++-P EP |", where the @,5 Dys eee » and p, 
are variables and the A, B, ... , and P are the names of the 


sets, to denote that a, represents members of the set named 


1 


A, ao represents members of the set named A, etc. I will use 


canonical systems to define the named sets. Using this nota- 


tion the above extended Markov algorithms are more precisely 


*Note that the character "%" is not an English letter. 


57 


stated 


| s,s' e¢ LETTER STR | 


LETTER STR<A>,<B>, ... 4<Z>3 
LETTER STR<a>,<b> + LETTER STR<ab>; 


(1) sI > 80 


(2) XsXs'X + ss! 


(3) sXs + Xx 
(4) Xs. > xX 
sx >: X 


Consider again the algorithm for reversing any parenthe- 
sized string of letters from the alphabet {I 0 X N}. Using 


the following variable and set definitions 


| c,d ¢ LETTER | 


LETTER<I> ,<O> ,<N>,<X>3 


the extended Markov algorithm for this string transformation 
can now be simply given 

cd" 7+ d#c 

(c# > ef 


A 
) > #) 


Note that by simply augmenting the set named "LETTER" (and 

the object alphabet) to include all the letters of the English 
alphabet, the same four extended Markov algorithm substitution 
rules define the algorithm for reversing a string containing 
all English letters, whereas 704 substitution rules are re- 


quired to define this transformation with a Markov algorithn. 


58 


Even with the extension to Markov algorithms given 
above, algorithms expressed in the extended Markov formalism 
often become exceedingly lengthy. One frequently occurring 
source of this lengthening is a requirement to construct the 
functional composition of two or more algorithms. Although 
Markov's monograph defines the additional substitution rules 
for taking two Markov algorithms and constructing the Markov 
algorithms defining their functional composition, the number 
of resulting substitution rules can be enormous. For example, 
for 2 Markov algorithms over an object alphabet consisting of 
all English letters, 1,457 substitution rules (Section 3.3, 
ref. 9) must be added to the algorithms to produce the algo- 
rithm representing their functional composition. Although 
by using the extension to Markov algorithms the number of 
additional rules could be reduced to 7, an algorithm composed 
by several functional compositions would quickly require many 
substitution rules and would be correspondingly difficult to 
understand. 


17,18 


On the other hand, Church's iA-calculus, a formalism 
that makes precise the notion of a function and its properties, 
is ideally suited to handle the concept of functional composi- 
tion. The next section presents the eiguai ian of the iA- 
calculus, and the subsequent section discusses the embedding 


of the formalism of extended Markov algorithms within the 


formalism of the A-calculus. This combined formalism 


59 


will provide the heart of this dissertation's target lan- 


guage for defining semantics. 


3.1b The AjCalculus* 


The A-calculus is a formalism for writing certain classes 
of expressions. One interpretation (the interpretation taken 
here) of the formalism is as an explication of ideas about 
the specification and application of functions. Let C and 
V be disjoint sets of symbols, not including the symbols 
{> . ( ) oO}, where "g" denotes a string of one or more blank 
spaces. The set C will be called the set of constants. The 
set V will be called the set of variables. A well-formed 
expression in the A-calculus is any string defined (recursive- 
ly) by the following rules: 


(a) If p is a variable, or p is a constant, then p is 
a well-formed expression. 


(bv) If E and F are well-formed expressions, then (E F) 
is a well-formed expression, 


(c) If v is a variable and E is a well-formed expres- 
sion, then Av.E is a well-formed expression. 


For example, if C comprises the symbols {3 SQ} and V comprises 
the symbol {X}, some example expressions are "3", "(SQ 3)" 

and "AX.(SQ xX)". An expression of the form (E F) is called 

a@ combination, and the expressions E and F in (E F) are called 
respectively the operator and operand of the combination. An 


expression of the form Av.E is called a \-expression, and the 


*“The terminology in this chapter is due mostly to Church and 
Landin, 


60 


expression E in Av.E is called the body of the Awexpression. 
Here, a A-expression of the form Av.E will be interpreted as 
a representation of the function mapping the variable v into 
the expression E. 

An occurrence of a variable in a well-formed expression 
is distinguished as "free" or "bound" according to the fol- 
lowing rules: 


(a) If E is an expression consisting only of a variable, 
the occurrence of the variable in E is free. 


(bo) If E and F are expressions, an occurrence of a 
variable in (E F) is free or bound according as it 
is free or bound in E or F, 

(c) If v is a variable and E is an expression, all oc- 
currences of v in Av.E are bound while an occurrence 
of a variable different from v in Av.E is free or 
bound according as it is free or bound in E. 

For example, in the expression "AX.(F X)", where "F" and "Xx" 
are variables, the occurrence of "F" is free and the occur- 
rences of "X" are bound. 

Church introduces rules for transforming expressions. 
Using these rules, some expressions can be transformed into 
a “principal normal form." The principal normal form of an 
expression may be viewed as a “canonical” or standard repre- 
sentation of the value of the expression. Because of the 
introduction of assignment and goto expressions into the 
target language to be presented later, the rules for trans- 
forming a target language expression into normal form will 


not always hold. Instead, the value of a target language 


expression will be defined in this dissertation by an 


61 


extended Markov algorithm specification of a machine that 
mechanically converts an expression into a canonical repre- 
sentation of the value of the expression. 

This machine will be defined formally in section 2 of 
this chapter. The operation of this machine for evaluating 
A-caleculus expressions will be presented informally in this 
section. 

In general, the value of a constant or free variable is 
the object denoted by the constant or variable. A list of 
the values of the constants and free variables is called an 
"environment." The value of a A-expression is called a 
"\-closure" and consists of two parts: (a) the expression 
itself, and (b) the environment in which the A-expression 
occurs, i.e., the list of the values of the constants and 
free variables in the expression. 

The value of a combination is the object computed by 
evaluating its operand, evaluating its operator (using the 
values of constants and free variables given by the environ- 
ment of the combination), and then applying the value of the 
operator to the value of the operand. If the operator of a 
combination is a A-expression, the result of applying the 
\-expression to its operand is computed by (a) coupling the 
bound variable of the A-expression with the value of the 
operand to which the A-expression is being applied (b) add- 
ing this couple to the environment of the A~expression, and 
(ce) evaluating the body of the A\-expression using this new 
environment. 


62 


Some example A-calculus expression are the following: 


3 r4X.3 {AX.3 2) 
(sQ 3) 4X. (SQ X) (AX. (SQ X) 3) 
x A\X.X (AxX.X 3) 


If "2", "3" and "SQ" are constants denoting respectively the 


integer two, the integer three, and the function mapping an 
above 
integer into its square, the nine expressions /denote 


the integer the function mapping X the integer 
three into the integer three three 

the integer the function mapping X the integer 
nine (presumably one integer) nine 


into its square 


some object the identity function the integer 
x three 


3.1lc The Marriage of Extended Markov Algorithms to the 


A-Calculus. 


This section combines the formalism of extended Markov 
algorithms within the formalism of the dA-calculus. The wedding 
of these two formalisms will form the basis for the target 
language that will be presented in Section 3.1d. 

Let E be a set of strings representing extended Markov 
algorithms, where the characters {[,],|, and "}do not occur in 
E. Let L be another set of strings, called the set of 
literals, where the character ' does not occur in L. Let C 


be a set of basic symbols, called the set of constants, where 


63 


RRL ee ne mee ert rs ore Sire 


fete en ee! Pee tt eM) sen eae ees 


each constant is either a string from E enclosed by the 
brackets [ and ] or a string from L enclosed by the quotation 
marks ' and '. Let V be another set of basic symbols, called 
the set of variables, where each variable contains no occur- 
rence of {[, ], or'}, (Thus the sets C and V are disjoint.) 
An expression in the combined formalism will consist of any 
expression M such each occurrence of a variable in M is bound 
in M. 

The extended Markov algorithms will be interpreted as 
definitions of primitive functions, the literals will be 
interpreted as representations of the objects upon which the 
primitive functions operate, and the variables will be inter- 
preted as names of primitive functions, literals, or functions 
of the primitive functions and literals. In the examples in 
the text, the quotation marks will often be omitted from 
constants that represent integers. 

Expressions in the A-calculus are strings of basic 
symbols, and hence to include an extended Markov algorithm 
in the A-calculus, it is necessary to have a linear repre- 


sentation of an extended Markov algorithm. An extended 


Markov algorithm of the form x 
D 
s, (+) 
Be +(e) 


64 


where X is the statement listing the string variables in the 
algorithm, and D is the definition of the sets named in X, 


will therefore be represented 
[x Ds, ele) ey | So +(+) t, i, ee 5, +(+) t)] 


For convenience, however, the statement X and the definition 
D will generally be given separately from the list of sub- 
stitution rules in the algorithm. For example, consjder the 


following expression; 
ra. ([B+D|C+F|0+r] a) 


This expression can be used in combination with other expres- 


sions to transform strings. For example the expression 
(Aa. ({B+D|C+F|0+I] a) 'COBBLER' ) 

successively takes on the values 
((B+D|c+F]0+Z] 'COBBLER' ) 

and finally 
FIDDLER 


In defining the semantics of computer languages, it 
will be convenient to consider the symbols {> + A [ ] |} as 
object alphabet symbols in an extended Markov algorithm. I 


therefore adopt the conventions that any string (not includ- 


" " 


ing the symbol ") enclosed by the quotation marks and 


65 


in an extended Markov algorithm is to be considered as an 
object alphabet string. This use of quotation marks allows 
us to consider extended Markov algorithms whose object 
strings are themselves extended Markov algorithms. This 
point will be discussed in the definition of the primitive 
function "CAT", to be presented shortly. 

The basic notation for the combined formalism is not 
especially suited to digestion by humans. To make the nota- 
tion more palatable, I will introduce a series of alternate 
notations for writing expressions in the combined formalism. 


The alternate notations will be given for convenience and 


conciseness in communicating the expressions to humans. The 


alternate notations for the \-caleculus, and the \-calculus 
definitions for conditional expressions and recursive func- 


tions are for the most part due to Landin. 


Alternate Notations for Extended Markov Algorithms: 


The linear representation of an extended Markov algorithm 
is difficult to visualize. Accordingly, I will generally use 


the notation 


8) +(- t 
so ++) ty 
Sn +(+) tH 


(where the variable and set definitions for the algorithm 


will be given separately) in place of the strict linear 


66 


representation of an extended Markov algorithm in the A- 


calculus. For example, the expression 
ra. ((B+D|C+F|0+1] a) 


will be written 


Aa. ( a) 


oat 
++ 4 
Hao 


The Function CAT: 


Let s be a string variable representing any string of 


characters and consider the following expression 
ra.([s. +e "TAs." 8 mye a) 


This expression defines a function mapping the value of the 
variable a into the extended Markov algorithm [A ++ a], 


where "a" here denotes the value of the variable a. This 


extended Markov algorithm when applied to an object string 
concatenates the string value of a to the object string. The 
function above will be called "CAT". For example, the expres- 
sion ((CAT 'HELLO') ' THERE') successively takes on the 


values: 


67 


betas A diel ocmiom IE RR Reis nk Tr 3 


((aa.({s. + "[A>e" 8" ]" ] a) "HELLO ') 'THERE') 
(([s. + "[Ase" 3° "]" ] tHELLO ') 'THERE' ) 
((A +* HELLO ] 'THERE') 


HELLO THERE 


Similarly, the expression ((CAT ((CAT 'HOW ') "ARE ')) tyou') 
takes on the value "HOW ARE YOU". Note that the extended 
Markov algorithm [s. ++ "[A+«" 5 "]" ] maps its object string 
into another extended Markov algorithm, and thus extended 
Markov algorithms have the ability to define functionals, 
i.e., functions mapping an argument into a new function. 

In defining the semantics of a computer language, it 
will frequently be necessary to concatenate strings to pro- 
duce a string that repraacnts an extended Markov algorithm 
or a string to which an extended Markov algorithm is applied. 


It will be convenient not to state explicitly the concatena- 


tion of strings in these cases, and I therefore introduce 


the following alternate solution. 


Let "Car" be the function as défined above, 

let Xj\1<i<n be expressions, and 

let (that.T. ((car( (oar X,) -Xg))'X3)) ... X,) be 
an expression whose value is an extended Markov 
algorithm or a string to which an:extended 
Markov algorithm is applied. The X, can be 
written directly in the form of the extended 
Markov algorithm or the concatenated string to 
which an extended Markov algorithm is applied. 


Thus, for example, the expressions 


68 


Am. AG. AB. (((CAT((CAT((CAT((CAT '[TRUE ++') a)) ' FALSE ++' )) 
B)) ')') mr) 


Aa.AB.([TRUE/TRUE ++ TRUE TRUE/FALSE ++ FALSE | 
FALSE/TRUE ++ FALSE | FALSE/FALSE ++ FALSE] 


((cat ((CAT a) '/')) B)) 
can be written 
Am. A40.A8.([TRUE ++ a | FALSE ++ 8) 1) 


Aa.AB.(([TRUE/TRUE ++ TRUE TRUE/FALSE ++ FALSE 
FALSE/TRUE ++ FALSE FALSE/FALSE +» FALSE] a/8) 


or further rewritten using the previously given alternate 


notation 


TRUE +° «a 
pee nasns. ([ RUE, a |e) 


PRUE/TRUE ++ TRUE 

TRUE/FALSE ++ FALSE 
Aa.2B. (J earse/rRUE ++ FALSE] °/8) 

FALSE/FALSE ++ FALSE 


The first expression defines a function® that when successively 


“Greek letters will generally not occur as object strings for 
extended Markov algorithms. I will therefore use Greek , 
letters in an extended Markov algorithm or the string to 
which it is applied to denote the symbols that are bound 
variables. Thus, in writing the strict representation of 
the algorithm or its object string in terms of A-calculus 
expressions, strings not containing Greek letters are to 
be quoted and the Greek letters are not to be quoted. 


69 


applied to three arguments produces the value of the variable 
a if the value of the variable 1t is "TRUE" and produces the 
value of the variable 8 if the value of the variable 7 is 
"PALSE", The second expression defines a boolean-valued 
function that when successively applied to two boolean valued 
arguments produces the value "TRUE" if both arguments have 
the value "TRUE" and produces the value "FALSE" if either 
argument has the value "FALSE". The first expression will 
later be used to define conditional expressions. The second 
expression will later be used to define the function for pro- 
ducing the logical "and" of two arguments. 

Note that the first expression above constructs an 
extended Markov algorithm from literal strings and bound 
variables. The notion of a bound variable lends itself im- 
mediately to extended Markov algorithms embedded within the 
A-calculus and allows the construction of extended Markov 
algorithms that depend on the values of the variables to 
which the algorithms are applied, This compatibility be- 
tween the married formalisms greatly simplified the defini- 


tions of the primitive functions for SNOBOL/1 and ALGOL/60. 


Alternate notations for the i-calculus: 


The basic notation for defining and applying functions 
in the A-calculus is somewhat awkward for those accustomed 
to writing functions in the conventional mathematical nota- 


tion. I thus introduce the following alternate notations. 


70 


Let F, Vi> Vo» Ba 8? pad i be variables and M, Q, EL, Ey 
... 5 E. be expressions. Expressions of the form 
n 
a ; 
(a) (AV). OV,... OAV OM Hed we Ey) E,) 
(b)  (AF.M AV, .AV,... AV). @) 
E wc 
Kel Aeect (Ey) Be) E) 
can be written 
Ga): LET Vi6Vig- tsa 3 Vict Bis Bie feta UGE 
IN ut 2 n dee n 
Ca) SoM POVS Vin. Soa. go VP SS 
in woe m 
(ce) F(E),E,, ere. y En) 
where if M,Q,E,,8£,; axe! 4 Or EO are enclosed in parentheses, 
the parentheses can be dropped. Thus, for example, the 


expressions 


(2X04 SQ" xX) 3} 
CCaAX.ay.(('catTt X) Y) 'HELLO ') 'THERE') 
(ACOND.(((COND 'TRUE') 0) 1) anna ae. Gravee | 7)) 


can be written 


71 


rn renner temo mmr eT ae 


LET X = 3 
IN 'SQ' x 


LET X,Y='HELLO ', 'THERE! 
IN. (('catT' x) x) 


LET COND(1,a,8) = co ae i 7) 
IN COND('TRUE',O,1 


Conditional Expressions: 


Consider the function COND defined previously 


COND(1,a,8) = ( pee 4 1) 


This function selects the value of a if the value of 7 is 
"TRUE" and the value of @ if the value of rs is "FALSE", For 
example, the value of COND('TRUE',O,1) is the string "0". 


Next consider the following expression from ALGOL/60 
IF A=O THEN BeA ELSE B/A 


and the (loosely written) expression in the combined formal- 


isn 
COND(A=0,BeA,B/A) 


where COND is defined as above. This expression does not 
correctly mirror the ALGOL/60 expression. In ALGOL/60 the 
expression BeA is evaluated only if the value of A is equal 
to zero, and the expression B/A is evaluated only if the 


value of A is not equal to zero. This order of evaluation 


72 


insures that B/A is not evaluated if the value of A is zero. 
Now consider the following (loosely written) target language 


expression 
(COND(A=0,A7.BaA, Am. B/A) 'At) 


where t is a dummy variable, In evaluating this expression, 
the function COND will be applied as its arguments, one of 
the A-expressions An.BaA or AtT.B/A, will be selected and then 
the selected A-expression will be applied to the operand ‘A'. 
Thus only the body of the selected A-expression will be 
evaluated.*® The use of the dummy variable serves as a delaying 
mechanism in evaluating eiueeserone: 

Conditional expressions of the above form will be used 
repeatedly in defining the semantics of computer languages. 


I therefore introduce the following alternate notation, 


Let Sy> So» ti» tos and t, be expressions. Expressions 
of the form 
tat 
(COND(s,,Am.t,,Am.t,) 'A') 


and 


Pat tat 
(COND(s,,Am.t,,A7. (COND(8,,A7.t,,A7.t,) A')) ta") 


can be written 
>, => Pi 
ELSE => t, 


*Note, in forming a A-closure, the body of the A-expression is 
not evaluated. 


73 


PRET 9 SRE ae eae 


and 


8 = +t 
si => ts 
ise => t§ 


Similarly, this alternate notation can be extended to include 
an arbitrary number of nested conditional expressions. 


For example, the expression 
(COND(A=0,A".BaA,An.B/A) 'A') 
can be written 


A=0 => Bea 
ELSE => B/A 


3.1d The Target Language 


The combined formalism of extended Markov algorithms and 
the A+calculus presented in the previous section appears suf- 
ficient to define fairly concisely many constructions in 
computer languages. However, two common features of many 
computer languages, that for assigning new values to variables 
and that for transferring control to another statement in a 
program, have evaded characterization in the combined formalisn. 
To handle this circumstance, the combined formalism will be 
augmented with new expressions to mirror directly the assign- 
ment of new values to variables and the transfer of evaluation 
from ome expression to another. The augmented version of the 
combined formalism will comprise the target language of this 


dissertation. 


14 


Sequences of Expressions: 


Before discussing the rules for forming well-formed 
expressions in the target language, let us consider a mechan- 
ism for defining @ sequence of expressions, where each expres-— 


gion E._,E »E. in the sequence is to be evaluated in the 


1° 2? eee 
numerical order indicated by its numerical subscript. Using 


n 


the rule for evaluating the operand of a combination before 
the operator of a combination, the target language provides 
a device for handling a sequence of expressions. 


Let X,E,E, ,E eee » and Eo be expréssions, and consider 


Qo 
the fotlowing A-expresgion, called T 


Aa-AB- EB a) 


When evaluated, the combination (f E) results in first evalu- 
eting the expression E and then returwing the value of the 
A=closure for AB. (8B a), where a ig coupled with the value of 


E. Next consider the combination 
[(t BE) aw. x} 


where square brackets have been used here (for convenience) 
in place of parentheses.* This combination {s evaluated as 
follows: 


1. The Awclosure for An.X¥ is computed 


Sie Pe 
"Square brackets will be used frequently in this section. 
Strictly speaking, a#11 sgudre brackets should be replaced 
by parenthéses, 


75 


2. The combination (fT E) is computed, resulting in 
first evaluating E and then returning the A-closure 
for 24B.(8 a), where a is coupled with the value of 
E. 


3. The value of the expression in 2 is applied to the 


value of the expression in 1, resulting in applying 
Am.X to E, which returns the value of X. 


In particular, if X is the expression "nr", this combination 
results in returning the value of E. 


Next consider the expression 
[(T E,) an. [(T E,) An.n]] 


This combination is evaluated as follows: 


1. The A-closure for An.[(T E,) An.a] is computed. 
Note that the value of Ey s not computed in forming 
the A-closure. 


2. The combination (T E,) is computed, resulting in 

first evaluating E, and then returning the A-closure 
1 

for A8.(B a) 

3. The value of the expression in 2 is applied to the 
value of the expression in 1, resulting in return- 
ing the value of [(T E,) An.a]. This evaluation 
results in first computing the value of E, and then 
returning the value of E,. 

Thus the evaluation of this expression results in first 


evaluating Ei» then evaluating E and finally returning the 


2°? 
value of E,- 


Similarly, consider the expression 


Ee E,) AINE Bg) Atel Ne E,) Anew] )] 


1 2 3 


When evaluated, this expression results in successively 


76 


E and E. and then returning the value of E 


I? “2? 3 3° 


This expression, however, has the following important property, 


evaluating E 


which will be used in the definition of the transfer of con- 
trol to some labeled expression in a sequence of expressions. 


Let C c and C. be the combinations that are given by the 


1’ “2? 3 


matching paris of square brackets indicated by the numbers 
1, 2, and 3 above. The evaluation of Cy) results in succes- 


sively evaluating Ej» E and E, and returning the value of 


2 3 


E the evaluation of C, results in successively evaluating 


33 
E, and BE. and returning the value of E 


results in evaluating E 


33, the evaluation of C. 


3 and returning the value of E3: 


More generally, an expression of the form 


len E,) a BE.) eds Aeelts E,) eet 


7 6 oe 


when evaluated, results in successively evaluating Ej» Eo» 
eee »y and En and returning the value of E.° Moreover, the 


evaluation of any combination C, beginning with the square 


i 
bracket denoted by the integer i results in successively 


evaluating the expressions E and EL and return- 


pe pga: wees 


ing the value of Ent. This later effect leads us to the notion 


of a "labeled" expression. 


Labels and Label References: 


Let V be the set of variables (as described earlier) and 


let L be the set obtained from V by affixing a ":" to each 


77 


variable in V. The set L will be called the set of labels. 


Consider an expression of the form 
2 ((T EL) aw.tg[(T E,) ... Ame ([(? E ) Anea)...)] 


where the £ l<i<n indicates the possible occurrences of 


i? 
labels, each of which must be different. An expression of 
this form will be called a "sequence" of the expressions E> 
Ens eee, and EB. If we ignore the labels in an evaluation, 


the evaluation of any combination C, following some label 


i 
Lis l<i<n, results in successively evaluating Ey» Bie? tee oy 
and En and returning the value of E.° 

A sequence of the above form may occur within the body 
of some A-~expression, which in turn may occur within a se- 
quence in the body of some encompassing A-expression, and so 
on for further encompassing \-expressions. In the target 
language the transfer of control to some labeled expression 
will be designated by expressions of the form (GOTO. E), 
where E is an expression referring to some label. A label 
reference will be a string of the form .2£ , where £: ig a 
label. The value of a label reference .% will consist of 
two parts: (a) the combination in the innermost encompassing 
A~expression such that the combination is prefixed by the 
label 2: , and (b) the environment within which the combina- 
tion is to be evaluated. The evaluation of a label reference 
will be called a "label-closure". 

I now proceed to a presentation of the target language of 
the dissertation. 


78 


Target Language Expressions: 


An expression in the target lIanguage is defined as 
follows. Let C, V, and L be sets of symbols, called the sets 
of constants, variables, and labels, as described earlier. 


(a) If p is a variable or p is a constant, then p is 
an expression. 


(b>) If E and F are expressions, then (E F) is an 
expression. 


(c) If v is a variable and E is an expression, then 
Av.E is an expression. 


(a) If v is a variable and E is an expression, then 
(v ASSIGN. E) is an expression. 


(e) If S is a sequence, then S is an expression. 


(ft) If E is an expression, then (GOTO. E) is an expres- 
sion. 


Expressions of type (a), (bd), and (c) are expressions in the 
combined formalism as introduced previously. Expressions of 
type (ad), (e), and (f) are new. The evaluation of an expres- 
sion of the form (v ASSIGN. E) will result in first changing 
the value of the variable v to the value of the expression E 
and then returning the null string :as the value of the 
expression (v ASSIGN. E). If the labels in an expression of 
type (e) are ignored, the evaluation of a sequence results in 
successively evaluating eaeh of the component expressions El» 
En» and EL in the sequence and returning the value Ene If £ 
is an expression of the form .& , where &: is a label, the 
evaluation of E will result in forming: the label-closure for 


-%£ and the evaluation of an expression of the form (GOTO. E) 


79 


within some sequence will result in (a) stopping the evalua- 
tion of the expression in which E occurs and (b) continuing 
by evaluating the combination designated by the label-closure 
for .2 within the environment specified by the label-closure, 
Note that this mechanism allows transfer of control only to 
expressions within the same sequence or expressions ina 
sequence in some encompassing A-expression. The previously 
given notation for defining a sequence of expressions is 
awkward. I thus introduce the following alternate notation 
in place of the strict representation of a sequence. Let E 


be a sequence of the form 
L(t E,) Am.@[(T E,) «2. Ame [7 E.) hare ® Laced J 


where the has l<i<n, indicate the possible occurrences of 


labels. A sequence of this form will be alternately written 


The addition of expressions of type (ad), (e), ana (f) 
take effect when it is desired to construct a sequence of 
expressions to be evaluated one after another or to interrupt 
the evaluation of a sequence and to continue the evaluation 
at some other labeled expression. 


For example, consider the expression 


LET A=5 
IN (A ASSIGN. (+(A,1))); 
(GOTO. .P); 
(A ASSIGN. 1); 
P:A 


80 


where "+" is a free variable whose value is the function for 
computing the arithmetic sum of two integers. The evaluation 
of this expression is as follows: 


(1) The value of the bound variable A will be set to 
five and the body of the A-expression evaluated. 


(2) Since the body of the A-expression is a sequence 
of expressions, each of the component expressions 
will be evaluated in order, 


(3) The first expression in the sequence results in 
updating the value of A to six. 


(4) The second expression results in transferring the 
evaluation to the expression labeled P. 


(5) The evaluation of the expression labeled P results 


in returning the value of A, which has been set to 
six. 


Recursive Definitions: 


Consider the following (loosely written) expression 
defining the factorial function and its application to the 
integer five: 

LET FACT(N) = EQ(N,0) => 0 

ELSE => NeFActT(N-1) 

IN FaAcT(5) 
where EQ is a boolean valued function for testing the equality 
of two integers, The function "FACT" when applied to the argu- 
ment "5" will not evaluate to five factorial. The difficulty 
here arises in the definition of the function "FACT" where 
the. variable "FACT" itself occurs as a free variable. This 


incorrect rendering of a recursive function can be corrected 


81 


through the notion of a "fixed-point operator, "°9925 One 
fixed-point operator for target language expressions is the 
expression 

Y = AYP. LET watat 

IN (nw ASSIGN. (F4r)); 2 

If M is an expression and F=E is a recursive definition of the 
function F, an expression of the form 

LET F=E 

In M 
where E contains free occurrences of the variable F, can be 
correctly written 

LET F = (Y¥ AF.E) 

IN M 


To avoid this somewhat awkward method for writing recursive 


functions, the following alternate notation is introduced. 


If F is a variable and E and M are expressions, an 
expression of the form 


LET F = (Y AF.E) IN M 


where Y is the fixed-point operator given above, can 
alternately be written 


LET REC F=E IN M 
Thus the definition of the factorial function can be correctly 
written 
LET REC FACT(N) = EQ(N,0O) => 0 


ELSE i NeFACT(N-1) 
IN FactT(5) ‘ a 


82 


The above fixed-point: operator is sufficient to handle 
recursive definitions of single functions but not simultaneous 
recursive definition of two or more functions. In this dis- 
sertation simultaneous recursive definitions will not be 
needed until the semantics of ALGOL/60 procedure declarations 
is defined, and the presentation of a fixed-point operator to 
handle simultaneous recursive definitions will be deferred 
until the chapter on ALGOL/60. A detailed discussion of 


fixed-point operators is given by Wozencraft. -> 


A Definition of the Semantics of the ALGOL/60 Subset: 


The definition of the semantics of the ALGOL/60 subset in 
terms of the target language is given in Appendices 2.1 and 
2.2. The specification of the corresponding target langusce 
expression for a program in the subset has been broken into 
two parts. “Appendix 2.1 defines the translation of a program 
into the target language assuming that the primitive "+" is a 
free variable. Appendix 2.2 defines the primitive mam To 
form the complete target language expression, one must take 
the target language string specified in Appendix 2.1 and adda 
to it the primitive function definitions of Appendix 2.2 in 


the form 


LET CAT a=[(s. ++ "[As-" 5 "J" Ja 
IN LET EQ(a,8) = ... (a) 


IN LET REC +(X,Y) = EQ(Y,0) => 0 ELSE => SUM(SUCC X,PRED X)° 
IN LET 4’ IN 8! 


83 


where "LET ad' IN s'" is the target language string specified 
by Appendix 2.1.* For example, Appendix 2.1 specifies the 


following pair of strings 


BEGIN INTEGER A; A:=1+2 END .. LET A= 'At 
IN (A ASSIGN. (+('1','2')) 


The string "LET A = 'A' IN (A ASSIGN. (+('1','2'))" when used 
in place of "LET a' IN s'" in expression (a) above specifies 
the complete target language expression for the program 


"BEGIN INTEGER A; A:=1+2 END", ## 


3.2 An Evaluator for the Target Language 


To explain the semantics of the target language in the 
previous sections, an appeal was made through the English lan- 
guage. This section reduces that appeal to an appeal for 


understanding only the formalism of extended Markov algorithms. 


*This division of the specification of the semantics of a 
computer language into a specification of a target language 
string and a separate specification of the primitive functions 
used in the target language string will be followed in the 
definitions of SNOBOL/1 and ALGOL/60. Also, the definitions 
of the string variables for the extended Markov algorithm 
primitives are given at the beginning of Appendix 2.2. These 
definitions must be added to each extended Markov algorithm 
using the string variables. 


**It may happen that the use of identifiers in a source language 
program will conflict with the use of identifiers used to de- 
fine the primitive functions in the target language. To avoid 
this conflict, the identifiers for the target language primi- 
tives strictly speaking should be given as identifiers that 
are different from the source language identifiers. This con- 
flict can be avoided by appending to each target language 
identifier a symbol (e.g., the symbol "#") not allowed in 
source language identifiers. 


84 


“The "value" of a target language expression will be defined’ 

in this section by an extended Markov algorithm definition of 

a machine that mechanically converts an expression into another 
expression, the value of the initial expression. The machine 
may be viewed as a hypothetical computer for the target lan- 
guage, and extended Markov algorithms may be viewed as the 
machine language for the computer. The definition of the 
target language evaluator is based on a similar definition 


20,24 and Wozencraft. “? 


given by Landin, 
The extended Markov algorithm definition of the target 

language evaluator is given in Appendix 2.3. Before applying 

the algorithm to a target language expression, it is neces- 


sary to provide a unique index for each ")" and "(" in the 


expression. Thus the expression 
(ax.('SQ' xX) '3") 
will be indexed 
(42%. (,'8@" x) 13") 


The indices allow unique identification of a \-expression 
or combination, 

The evaluation of an expression begins with a substitu- 
tion rule transforming the expression to be evaluated into 
five strings: the "control" string, the "result" string, 
the "environment" string, the "store" string, and the expres- 


sion itself. Subsequent substitution rules define transforma- 


85 


tions on the control, result, environment, and store strings 
until the value of the target language expression is computed. 
The final substitution rule returns the value of the expres- 
sion. 


Generally, the control string is a string of the form 


ay aril eee 1 


where each a l<i<xk, is an atomic part of an expression 


4? 
(e.g., a constant, variable, indexed lambda symbol, or indexed 
left parenthesis). The control string is used to hold the 
atomic parts of an expression before they are evaluated. 

When the parts of the control string are evaluated, their 


values are placed on the store string. The store string is a 


string of the form 
(111...1, r,) aes (111,r,)(11,r,)(1,7r,) 


where each r l<i<n, is a string denoting the value of a 


4? 
constant, a variable, or a A~expression, and the string of 
ones before each string value provides a unique pointer to 
the string value. <A new store component for a string rnel is 
obtained by (a) obtaining the string of ones representing the 
pointer p to rj, and (bo) prefixing the string "(1p.T14,)" to 
the left of the store string. 

The result string is used to store pointers to inter- 


mediate calculated values formed in the evaluation of a target 


language expression. The result string is a string of the form 


86 


Pao ttt Po Py 


where each Pys i<l<m, is a pointer to some string value in 
the store. 


Let NM, >N,>M soe oN My denote strings of ones, let 


2? 
VieVos cee oe VE denote variables, and let Pi»Po> ees Py 
denote pointers to the store. The environment string is a 


string of the form 


(We+M, Vy=P,) eee (N,+M, Vo") (N*M, V,=P,) 


i a7Pa? is a string such that N,, 


l<i<k, identifies the environment for some A-expression Aye 


where each component (Nj <M v 


v, identifies the bound variable v of A is a store 


i 3? Py 


pointer to the current value v, and M, identifies the environ- 


ment of the encompassing A~-expression, The environment M, is 


said to be "linked" to the environment N,. In general, the 


environment components linked to N, provide pointers to the 


i 


current values each of the bound variables in the A-expres- 


sion A, and its encompassing A-expressions. The list of 


J 
environment components linked to Ny will be called the 
environment N,. For example, consider the environment "11111" 


i 


in the environment 


4 


(11111+11 X=111111)(1221«11 A=11)(111«11 B=111)(11«1 Y=111)(1+1 2=1) 


The environment components linked to "11111" provide store 


pointers to the current values of the variables X,Y, and Z in 


87 


the A-expression whose environment is identified by "11111". 
A new component is prefixed to the environment string 
each time a new A-expression is applied. Thus each Ny at the 
left of each environment component identifies an environment 
for some applied \oeapyession, and the environment components 
linked to N, provide pointers to the values of the free vari- 


ables in the body of the \jA-expression whose environment is 


given by WN Since constants in the target language are 


i 
treated as literal strings whose values are the strings then- 
selves, the values of the constants in an expression are not 
placed on the environment string. | 

The set definitions for the string variables used in 
the extended Markov algorithm definition of the evaluator are 
given in Appendix 2.3a. The set "STR" defines the set of all 
strings that might occur within a target language expression, 
The sets "CONSTANT" and "VARIABLE" define the sets of con- 
stants and variables. The sets "PTR" and "INDEX" define 
respectively the set of pointers to the store string and the 
set of indices used in marking an expression. The set "Exp" 
defines the set of target language expressions, the set "EXP HD" 
defines the set of strings that can occur at the head of an 
expression, and the set "EXP TL" defines the set of strings 
that can occur at the tail of an expression. For example, in 
the expression "(Apx. (,'8Q' X) '3')" the string ie is the 
head of the expression and the string "ApX. (,'8Q' X) '3')" is 


the tail of the expression, and in the expression "X" the 


88 


variable "X" is the head of the expression and the tail of 
the expression is null. 

The substitution rules for the extended Markov algorithm 
definition of the target language evaluator are given in 
Appendix 2.3b. Three alternate notations were used in writing 


these rules: 


(1) Let x, and y,, 1l<i<5, be string variables repre- 
senting arbitrary strings used in an extended 
Markov algorithm. Generally, each substitution 
rule is of the form* 


SCY) -XoTV¥9-X3€Y 4-Xy8Vy,-XPY5? > Shy -Xor'¥5-X3e'Y 3-%,8'Y-XoP'V 5? 


where the c, r, @€, 8, and p are string referring to 
portions of the control, result, environment, store, 
and expression strings and the ct, r', e', s', and 
p' are the transformed portions of these strings. 
Since the x, and y, occur in each substitution rule, 
a substitution rule of the above form will be written 
in the form 


c et 
r r! 
e > fe 
s s' 
Pp p' 


(2) If one of the five strings c, r, e, 8s, or p is given 
as null on both sides of the substitution rule, the 
symbol "_" can be used in place of the null string sym- 
bol "A", 


(3) If one of the five components c, r, e, 8, or p occurs 
unchanged in the right-hand side of the substitution 
rule, the symbol "I" can be used in place of the 
string in the right-hand side of the rule. 


*The hyphen "-" is used to separate the control, result, en- 
vironment, store, and expression strings. 


89 


Thus the substitution rule 
- _ - - tet 
<(¥y x Ay, x34y, x, Ay), x,(,ht h't ¥5? 
t - ter 
+ <h' h APPLY. y,-x,Ay,-x,Ay4-x) Ay), x,(,ht h't )¥ 5? 


can be written using notation (1) 


( h' h APPLY. 
k A 
Aj > A 
A A 
(ht htt!) (,ht h't') 


and further written using notations (2) and (3) 


G h' h APPLY. 
- > - 
(, ht h't!) T 


Three example evaluations of target language expressions 


are given on the adjacent pages. Each of these evaluations 


shows the successive transformations on one of the initial 


expressions:* 


('sQq' '3') LET x='3! LET X='3! 
im 6('SQ' x) IR (X ASSIGN. 'h'); 
(GOTO. .L); 
(X ASSIGN. '5'); 
L: X 


*The constant 'SQ' in the first two expressions represents 
the primitive function for squaring an integer. Strictly 
speaking, all primitive functions in the target language 


must be defined by constants that are extended Markov algo- 
rithms. 


(6 © ttttt) Cvtrate eee teeta es cree - 
° fe (Tee TORT hex TTT) | eee 


: UU rere? 1° Oe etc * ttt Us 
ahh tt} ut Lohse 
ty tT cataay 1) -atgav vos, 


(Tee TOT Tle Cell) femme TTT CTT) fe 
t, tt L 
Iort 


G 
OU canaav te, x Ty Tt) caraay ves. ty ry 


OTIS ar Thee Sah) 
wy 


(tas Totpfemmn (00. Cr 88.8) 22h) 
t 


Cve ences teste eer (vr (eta) 


Uo 


Uo sagaav Ay T) canaav 2x 


: (ots b86D 
(¥°UCE* TLRS TTL) CENT) atts 
(ton Tot) femmmme(.€, 8 


Ty cr tte Tar 
UT aseav T) cataav 108, Ty eangay be. ve, 


SROLTVNIVAT UIANL ARV SMOLSSAUENZ WINVKT L20nL 


91 


c6 


Mote: In this exemple, decimel digits will be usea in Place of the corresposding 
strings of ones denoting store pointers an@ environment names. 


Bar| (x asstam, ¢h')) sere ly lat (yeoro. .1)) ro Ga Gal (15% Asstom. *5°)) tab islet RD Ayyeesd) 83") 


Ay gteA gf (o08 a) Nay teAgg8- (38 a) Magee dag 8 Cogs «) Vaz PeAggh- logs a) 
2 I, (ile ly (, Ayg APPLY. apptr. |, |, X ABSION. APPLY. Ay, APPLY. APPLY. [2 |, 
ah ae3c7 fle 2,32 | le ly 25 Pt leh 
——e | (1-1 201) lize x02)1 ——e): ——-s|1 
Q,a) (3,42€,1(2,30(1,4) (edgendt (5,8)1 
the above expression 4 I t 
aoe: APPLY 4,5 APPLY. APPLY. |, |, Aig APPLY. APPLY. |. |, Aas ) aprir. |, {, 
5.3.5.2 1254 |e |, aoe fP* Ia ty 37 aera 
— |r ———— |: J > | (+2 201) (202 x02) (261 12) 
: eR SIOVCPLS IE MCARICH SIC LE (6,a, gc a)t 
¥ - 
1, arrtr. 12 |, (50 by te Ia arrty. |, fe ta 
a tls *& lah uyr fia leh 2,5.36562,5.2,523,5.2 fh dy le ti 
—_ /r meme | (ho 3 feb) (5-2 a0) (2-1 Xe2}(1+1 201) Seg | I 
(Tory gt g)T I I 
r I 
y ', ly lg l, +L GOTO. APPLY. AQ, APPLY. l, I, Py I GOTO. APPLY A,, APPLY. ls ly to ly 
tr tls ty te ta 2.3.2.2 J9 ls ly le ta 6.3,6.2,6.2 [98 Is Ty le Ia 
mm (Se ve I wf (502 er) (be3 Beb)( 3-2 001) (2+1 Xe2)(202 20) ———s |: 
1 N Coa g gts MT aAagty 6oAage gl CSuN) (Hedge Sedge, (2 MICLsA) (9. ge) 
1 t 


(8+7 202)(T+6 pH9) (62 ee2)T 
(12 Aggtg) (20,2276 2)(9ea 80) 
t 


(thse rules simply result ia placing a poiater to the 


Gs la a Ig ty le ta 
9 la I 2D 2 ede Ded Lede eSadbeTaleT  SeFe2ySodeTeSeSelaleSsSeSel @ lg ly Ps I 
——p it —_—eee 

2 current value of Z on the result etring) 

T 


12,12,23,22 12 


aM mw wD > 
er 


Initialization and Termination of Evaluation (rules 1 and 12)* 


The evaluation of an expression begins (rule 1) by 
initializing the control string with the head of the expres- 


sion to be evaluated and the marker ", initializing the 


we 


result string with the marker initializing the environ- 


1? 
ment string with the string "(1+1l 7=1)", initializing the 
store string with the string "(1,A)", and initializing the 
expression string with the expression to be evaluated. Since 
the initial environment will generally contain the values of 
no free variables, the initial environment string contains 
the dummy variable wt whose value is a pointer to the null 


" is placed on the control 


string in the store. The marker Bae 
and result string to denote that the head of the expression is 
to be evaluated within the initial environment 1. In general, 


the subscript j of the leftmost in the control string de- 


F 
notes that the control string variables to the left of the 
l, are to be evaluated using the environment j, i.e., using 
the environment components linked to the component 
a = = 
(N, M, Y; P;) where N,=j. 
The evaluation terminates (rule 12) when the control 

string is null. When the control string is null, the result 
*Rules 1 and 12 do not exactly follow the alternate notation 


for the evaluator given earlier. These rules are strictly 
given as 


ht $ <h] 4-],-(Ayea, wel )-(1,4)-nt> 
12 
<A-p-X393-%), (Por )yy-%¥5? * r 


93 


string will contain a pointer to some string value in the 
store. Phe string in the store is returned as the resuit of 
the evaluation. In general, the result of an evaluation is 
either a constant or a A-closure. Strictly speaking, if 

the result of the evaluation is a A-closure, the \-expression 
and the values of its free variables should be returned as the 
result of the evaluation. If the result of the evaluation is 
@ A-closure, the A-expression and thé values of its free 
variables can be obtained from the environment, store, and 
expression strings specified prior to the termination of 
eyéeiuation. 

If @ user were evaluating terget language expressions 
with input-output factlitiesy (a) the initial values of the 
input and output strings {presumably those given on some 
device like a telétype or card reader) could be placed in 
the initial store string and (bd) two system variables and 
poititers to their initial values could be placed on the 
initial environment string. The addition or removal of 
strings on the input or aitput device could then be défined 
by updating the values of the system variables to thdéir yew 
Yalues. This is the mechanism uged to define input-output 


th SWOBOL/1 (see Chapter IV) « 
Evaluation of Combinations (rwle 2): 
If a left parenthesis of a combination is at the left 


of the cont¥ol string, the left parenthesis is removed from 


94 


the control string,* and the head of its operand and operator 
are prefixed to the control string and the string "APPLY." is 
placed to the right of these two strings. Subsequent rules 
will evaluate the operand and operator, and then apply the 
value of the operator to the value of the operand to produce 


the value of the combination. 


Evaluation and Application of A-expressions (rules 3, 8, and 11): 


If the name A, of a A-expression is at the left of the 


i 
control string (rule 3), the current environment j (initially 
the dummy environment 1) is obtained, the string Maye," is 
placed in a new component at the left of the store string, 
and a pointer to the new store component is prefixed to the 
result string. The string Ayes" represents the A-closure 
for A, in that (a) A, Provides a name uniquely identifying 
the A~-expression Ay contained in the expression string and 
(bd) the environment component j provides the (linked) list of 
the pointers to the current values of the free variables of 
the dA-expression Aye 

If the string "APPLY." is at the left of the control 
string, a pointer p to a Awclosure A, ¢, is at the left of the 
result string, and k is the index of the most recently added 


environment component (rule 8): 


*In the discussion to follow, unlegs explicitly stated 
otherwise, the elements referred to at the left of the 
control string are assumed to be deleted from the control 
string after being evaluated. 


95 . . 


(a) a new component (1k+j v=p'), where v is the bound 
variable of the A~expression A, and p' is a pointer 
to the operand to which the A-éxpression \, has 
been applied, is prefixed to the environment string. 
(This action results in setting the proper environ- 
ment for evaluating the body of the A-expression hye) 


(bo) The head of the body of the A-expression X, and a 


marker ren are prefixed to the control string, and 


(c) the pointers p and p' to the A-closure and its 
operand are deleted from the result string and the 


marker Le is prefixed to the result string. 


If a marker is at the left of the control string and 


F 
@ pointer p and marker |, are at the left of the result string, 
the markers are deleted and the pointer p is left on the 


result string. The pointer will point to the value of apply- 


ing the A-expression to its operand, 


Evaluation of Variables and Constants (rules 4 and 6): 


If a variable is at the left of the control string, a 
pointer to the current value of the variable is prefixed to 
the result string (rule 4.1). The pointer is obtained by 
(a) obtaining the index j of the current environment and 


"nom (rule 


marking the environment component j with the symbol 
4.3), and (b) then searching (rules 4.1 and 4.2) through the 
environment components linked to j for the occurrence of the 
variable. 

If a constant is at the left of the control string (rule 


6), a new store component containing the constant is pre- 


fixed to the store string, and the pointer to the new store 


96 


component is prefixed to the result string.* 


Evaluation of Label References (rules 5): 


If a label reference .% is at the left of the control 
string (rules 5), each environment component linked to the 
current environment component is searched for the occurrence 
of a component such that the A-expression whose environment 
is specified by the component contains a body that is a 
sequence containing the label. If the label is found, a 
new store component hej containing the head of the expression 
following the label and the index j of the environment com- 
ponent is prefixed to the store, and a pointer to the new 
store component is placed on the result string. The head of 
the labeled expression and the environment index j provide 
a representation of the label-closure for .2% in that the 
head of the labeled expression uniquely identifies the labeled 
combination and the index J uniquely identifies the current 
environment of the sequence within which the combination 


occurs. 


Transfer of Control (rule 10): 


If the string "GOTO. APPLY." is at the left of the con- 


trol. string and a pointer p to a label closure hey» where 


*In the evaluator, all constants that are extended Markov 
algorithms must be enclosed. by the quotation marks ‘. and 


97 


EE se eT See aS oh rei ae Ac a a Le eae a ee a ae a ee aR gS EN ee eS eae a a ee a a a ge ge ne ee Gee Bee ey Ng ee 


h is the head of a labeled expression and j is the environ- 
ment within which the labeled expression is to be evaluated, 
is at the left of the result string 


(a) all portions of the control and result strings to 


the left of the markers l, are deleted, and 


(b) the head of the expression following the label is 
prefixed to the control string. 


This mechanism results in interrupting the evaluation of the 
current expression and continuing with the evaluation at the 
labeled expression using the environment j specified while 


evaluating the label-closure. 


Application of Constants (rules 9,1 and 9.2): 


If the string "APPLY." is at the left of the control 
string, and two store pointers p and p' to the strings s 
and s' are at the left of the result string, the string s 
is applied to the string s' (presumably s is an extended 
Markov algorithm and s' is the object string to which the 
algorithm is to be applied). The resulting string value is 
placed in a new store component, and the pointer to the new 


component is prefixed to the result string. 


Assignment (rules 7.1 and 7.2): 


If the string "ASSIGN. APPLY." is at the left of the 
control string and two store pointers p and p' are at the 


left of the result string, the string value in the store 


98 


associated with p is changed to the string value associated 


with p'. 


Addition of New Rules to the Evaluator: 


It may happen that certain source language constructions 
are awkward to define solely within the target language and 
that these constructions can be more easily defined by adding 
new expressions to the target language and new evaluator 
rules to evaluate these expressions. 

The rule applied to evaluate target language expressions 
is specified by the numerically first rule that is applicable 
to the current string values of the control, result, environ- 
ment, store, and expression strings. By adding a rule to the 
evaluator whose left part specifies a configuration of the 
control, result, environment, store, and seavession strings 
that, for the given configuration, provides a different trans- 
formation from the initial evaluator rules, the evaluator can 
be extended to define new types of target language expres- 
sions. 

Generally, the rule applied by the evaluator is deter- 
mined by the element at the left of the control string. For 
example, in the definition of indirect addressing in SNOBOL/1, 
it was desired to add a rule to the evaluator that would take 
some string value given in store and prefix the string value 


to the control string. The string value prefixed to the control 


99 | 


string would then be evaluated in subsequent transformations 
as if the string value were itself a variable, By (a) allow- 
ing expressions of the form “(LOOKUP. X)", where X is a 
variable, in the target language translation of SNOBOL/1, and 


(dv) adding the rule 


LOOKUP. APPLY. 
Pp 


(p.s) 


THI >a 


to the evaluator, the extended evaluator defines indirect 
addressing. None of the initial evaluator vuies are appli- 
cable to a configuration where the string "LOOKUP." is at the 
left of the control string; hence the rule can be placed in 


any numerical position within the initial sequence of rules. 


3. Discussion 


This chapter has presented a formally based target lan- 
guage in which the semantics of a computer language can be 
defined. The semantics of the target language was, in turn, 
defined in terms of the formalism of extended Markov algorithms 
by giving an extended Markov algorithm definition of a machine 
for evaluating target language expressions. 


If used as a target language for. the implementation® of 


*Extended Markov algorithms have been implemented in the 
source language PANON-1BU, 


100 


a@ computer language, the target language allows the simple 
addition of built-in machine primitives. For example, if a 
computer has a built-in primitive for computing the sum of. 
two integers, there is no need to define this primitive in 
the target language. This primitive can be used as a constant 
in the target language and in applying the primitive to its 
arguments the machine algorithm can be used. The point of 
using only extended Markov algorithms to define primitive 
functions is that for implementation of the target language 
the only necessary machine capability is that for implement- 
ing extended Markov algorithms. The fact that a given 
machine has certain built-in primitives simply relieves the 
person defining the semantics of a source language of defin- 
ing the semantics of the built-in primitives in terms of 
extended Markov algorithms. 

The target language is uudenaeavie in one important 
sense. The computer teauuape constructions for defining the 
geataunsnd of new values to variables and for defining the 
transfer of control within a progres required the addition 
of Néwocunreseions to the combined fovdalivac of extended 
Markov algorithms and the Aecalculus. The new expressions 
add to the complexity of the cant language and place re- 
strictions on the applicability of any theorems developed for 
Aecalculus expressions. This undesirable feature of the 
target language is, in part, redeemed in that the evaluator 


for the target language was completely defined within the 


101 


cot 


In the sixth century B.C. written language was continuous. 
There was no concept of breaking up units of expressions with 
punctuations marks. Kohmar Pehriad, a leading Macedonian 
literary figure, had the insightful idea of using a small 
round dot to indicate the end of a thought unit. Convinced 
of the utility of his invention, he spent almost thirty years 
of his life traveling through ancient Greece, Rome, and North 
Africa attempting to gain local acceptance of that small 
round dot. His effort was well-rewarded. The stark sim- 
plicity of his brilliant idea became popular so quickly that 
almost every written language used today uses the little 
round dot at the end of a unit of expression. 

Pehriad's efforts did not stop with the dot. Recognizing 
the need for another mark to indicate pauses in the middle 
of thought units, he began using a dot with a curved descend- 
ing tail in en expression to indicate a pause in the thought. 
This mark is, of course, quite familiar in our own language, 
and both the comma (Kohmar) and the period (Pehriad) have 
been named after their distinguished inventor. 


*£2048FY UO yIeU Sty AJeaeT ATUTe4z99 sey oy 3 4UuTH 
Z88A PBTIYSgd AeMYoY oyA moug Nok og 


“UOTZVITOSSTE STUR JO szaqdeyo OAY 4XOU 944 OsSTIdmos TITAa 

SaPenBusel OA 289044 JO soTyuBsMsS pus xBquks 944 JO uotTysyuss 

-ard aut °09/TONTV PUY T/TOMONS 490q JO SoOTQUBMOS |Yy SUTJeD 
0% QudSTOTZINS st sFenFusey~ 7yoBr1e4 949 Spusyq szaqyzo a4 uo 

“AZYINGISTED STGR SATOBST TI}TA Youvasar sanyng 

4844 odogq [| puv sufemer eFenBuseT 4eB10e4 oyQ JO yn 


8749 ‘SssaTeqzreaeg “SHAQ TIOFZTS aoyrBe_ pepue4yxs fo wsyTeUsI0g 


CHAPTER IV 


A DEFINITION OF THE SYNTAX AND 
SEMANTICS OF SNOBOL/1 


In this chapter I attempt to demonstrate the thesis of 
this dissertation, that there should be formal definitions 
of the syntax and semantics of computer lamguages. As an 
example computer language, I have chosen SNOBOL/1, as initially 
defined by Farber, Griswold and Polonsky.-! SNOBOL/1 was 
chosen as an example because (a) the language is simple 
enough to describe conveniently in a single chapter of this 
dissertation and (b) the language is fairly well-known. No 
knowledge of SNOBOL/1 will be assumed in this chapter. Rather, 
it is the intent of this chapter to define every construct 
(except character spacing) in the language. The definition 
of SNOBOL/1 will be in two parts: (a) an informal description 
of the language and of the techniques used in the formal de- 
finition in this chapter using the English language and (b) a 
formal description of the language in Appendix 3 using the 
formal system. 

This chapter and the formal description of Appendix 3 
may be viewed as a reference manual for SHOBOL/1. It is in- 
tended for a user who wishes a detailed description of the 
language. 

The famal definition of SNOBOL/1 is divided into three 


parts. Appendix 3.1 gives the canontkal system defining the 


103 


syntax of SNOBOL/1, Appendix 3.2 gives the canoniml system de- 
fining the translation of SNOBOL/1 into the target language, 
and Appendix 3.3 gives the definition of the primitive func- 
tions used in the target language. In writing the formal 
definition of the SNOBOL/1, it was necessary to resolve a 
few issues that were ambiguously or incompletely defined by 
the English language definition of the language given by 


Farber, Griswold and Polonsky.* 


Introduction to SNOBOL/1 


SNOBOL/1 is a language for defining transformations on 
strings of symbols. Programs in SNOBOL/1 are comprised of 
& linear sequence of rules of which there are four varieties: 
"input"rules for obtaining strings of symbols from some 
external input device (like a teletype or card reader), 
"assignment" rules for assigning names to strings, "pattern 
matching" rules for transforming strings into new strings, 


and " 


output" rules for writing strings on some external out- 
put device (like a teletype or card reader). In general, 
the behavior defined by each rule is executed in linear 


order. However, rules can be labeled with names and the 


¥*For example, it was not clear whether the authors meant to 
permit or prohibit the use of the same variable name to 
denote different types of variables in a single pattern 
matching rule or whether to permit or prohibit the use of 
a@ name both as a string name and a label in the same pro- 
gram. I decide to prohibit the first of these construc- 
tions and to permit the second of these constructions. 


104 


ordinary sequence of execution interrupted and continued at 


some other labeled rule. 


Introduction to the Technigues Used in Describing SNOBOL/1 


The parts of this chapter will each describe some con- 
struct in the SNOBOL/1, e.g., a string, an arithmetic expres- 
sion, a rule, or a statement. Each of these parts will con- 
sist of (a) portions of the productions from the canonical 
system of the translation (Appendix 3.2) of SNOBOL/1, (b) 
examples of the SNOBOL/1 constructs and their corresponding 
target language translations, and (c) an English language 
explanation of these constructs and their semantics as de- 
fined in the target language. 

Theoretically, the (abbreviated) canonkal system of the 
translation of SNOBOL/1 must be combined with the canonical 
system of the syntax of SNOBOL/1 to obtain the Scenics 
canonical system defining the set of legal programs and their 
target language translations. Nevertheless, except for the 
context-sensitive requirements on SNOBOL/1, the abbreviated 
canonimisystem of the translation of SNOBOL/1 provides a 
synopsis of a context-free specification of the language and 
its semantics in terms of the target language. Accordingly, 
the productions from the (abbreviated) canoniml system of the 
translation will be used in the text to define the syntax 
and semantics of SNOBOL/1, and the specification of the 
context-sensitive requirements on syntax will be discussed at 


the end of the chapter. 


105 


As mentioned in the previous chapter, the first term of 


each term tuple in the specification of the translation of a 


won 


language is generally of the form "s..t" where "s" represents 
some string in the source language and "t" represents the 
corresponding target language translation. The example 
SNOBOL/1 strings and their target language translations 


given in the text follow this notation. 


Strings 


DIGIT<O>,<1> ... ,<9>; 

LETTER<A>,<B> ... ,<Z>; 

MARK<%>,<.>,<=>, 2... </>} 

DIGIT<p> | LETTER<p> | MARK<p> + BASIC SYMBOL<p>; 
BASIC SYMBOL<b> + STRING<SEQ(2)>; 


Example Strings: 


ABC123% A ROSE IS A ROSE 
HESSE,KAFKA, MANN ALPHA 
The basic symbols in SNOBOL/1 are the decimal digits, 
the capital English letters, and a variety of other symbols 
like "S", "." and "=", A string, the basic data type, con- 


sists of any linear sequence of basic symbols. 


Names 
DIGIT<p> | LETTER<p> + NAME<p>; 
NAME<m> ,<n> +> NAME<mn>,<m.n>; 
NAME<n> + §TR NAME<n,.n>,<$n..(LOOKUP. n)>; 
NAME<n> > VAR NAME<n>; 
NAME<n> + BACK REF NAME<n>; 


106 


Example Names: 


ALPHA 1234 
ABC.EFG 12.3 
$BETA $1234 


A string can be assigned a name and the name used in 
Place of the string. A name consists of a sequence of decimal 
digits and English letters, possibly including medial periods. 

Besides designating a string, a name can be used in two 
other contexts, that of a string "variable" and that of a 
string "back reference." These three uses of names shall be 
distinguished by calling a name that designates a string a 


"string name," a name that designates a variable a "variable 


name," 


and a name that designates a back reference a "back 
reference name." A string name is treated as a variable in 
the target language. 

A string name can be indirectly referenced by prefixing 
a string name with a dollar sign. The string value of a 


string name prefixed by a dollar sign is the string whose name 


is the string value of the name prefixed by the dollar sign. 


For example, if the string value of the name "BETA" is the 
string "A ROSE IS A ROSE" and if the string value of the name 
"A" is the string "BETA", the string value of "$A" is the 
string "A ROSE IS A ROSE", The primitive function "LOOKUP." 
is used to handle indirect addressing in the target language. 
"LOOKUP." is defined by an extended Markov algorithm substi- 


tution rule (Appendix 3.34) that must be added to the target 


107 


language evaluator.* When evaluated, this substitution rule 
inserts the string value of a name at the left of the control 
string. Thus the string is treated as if itself were a 
variable to be evaluated in subsequent steps taken by the 


evaluator. 


DIGIT<d> > DIGIT STR<SEQ(d)>3; 

DIGIT STR<s> +> INT<s>,<-s>3 

INT<i> + ARITH OPERAND<*i7,.'i'>; 
STR NAME<n..n'> + ARITH OPERAND<n..n'>; 


ARITH OPERAND<a..a'>,<b..b'> + ARITH EXP<atb..(+(a',b'))>, 
<a-b..(-(a',b'))>, 
<awb..(a(a',b'))>, 
<a/b..(/(at,bt))>; 


Example Arith Operands: Example Arith Expressions: 
“65".."65! A+B .. (+(A,B)) 
#657, .'-65! A+ “65%... (4(A, '65')) 
A.A Aw®-65..(#(A, '-65')) 


SNOBOL/1 allows a limited type of arithmetic on strings 
whose contents are integers. An integer can be used directly 
as an arithmetic operand by enclosing the integer in the 


(<4 *" 


quotation marks and . A name whose string value is an 


integer can also be used as an arithmetic operand. An 


*As mentioned in the chapter describing the target language 
evaluator, it may occasionally be convenient to define some 
source language constructs by adding rules to the evaluator 
rather than by defining the constructs solely within the 
target language. To define indirect addressing in the target 
language would require complicated additions to the canonical 
system of the translation of SNOBOL/1 


108 


aa 


arithmetic expression consists of an arithmetic operand 
followed by one of the arithmetic operators "+", "=", "a", 

and "/" (defined in Appendix 3.3b) followed by another arith- 
metic operand. The string value of an arithmetic expression 


is the string computed by applying the arithmetic operator 


to the integer value of the two operands. 


String Expressions 


STRING EXP<A..tAt>; 
STRING<s> 

STR NAME<n,.n'> 

ARITH EXP<a..a'> 

STRING EXP<s..s't>,<t..t!> 


STRING EXP< s ..'s'>; 

STRING EXP<n..n'>; 

STRING EXP<a..a'>; 

STRING EXP<sOt..((CAT s') t')>; 


+t 


Example String Expressions: 


AJ JTAt NAME REVERSE..((CAT NAME) REVERSE) 
“ABC123%™, .'ABC123% ' “apc® a,.((CaT 'ABC') A) 
A..A XY Z..((catT ((cat xX) Y)) 2) 


$a.. (LOOKUP. A) 


A string expression in SNOBOL/1 is an expression whose 
value is a string. A string can be used directly in an arith- 
metic expression by enclosing the string in the quotation 
marks “anad™, A string name or arithmetic expression can 
also be used in a string expression. A sequence of string 
expressions each separated by one or more spaces*® comprises 
a complete string expression, The value of a string expres- 
sion is the string computed by concatenating the string values 


of each of the component string expressions. 


*The symbol "g" denotes one or more spaces. 


109 


Patterns* 


STRING<s> + PAT EXP<%g®, ,'g'>; 
STR NAME<n..n'> + PAT EXP<n..n'>; 
VAR NAME<n> + PAT EXP:SPECS<a#n#,.'n' : neSTR|>;# 
VAR NAME<n> + PAT EXP:SPECS<#(n)#..'n' : neBAL STR|>; 
VAR NAME<n>, DIGIT STR<d> + PAT EXP:SPECS<en/de..'n' : 
(n,ad)eFIX LN STR|>; 
BACK REF NAME<n> + PAT EXP<n..'n'>; 
PAT EXP<p..p'>,<q..q'> > PAT EXP<p q..((CAT p') q')>;3 
PAT EXP<p..p'> + PATTERN<p..p'>; 


Example Patterns: 


*#ABC™,.'ABC' 

X Y¥..((caT x) Y) 

eNAME#..'NAME' ; NAMEcSTR | 

@NAME® @.@,.((CAT '"NAME') ',') : NAMEcSTR 

aXe “ABC™ a(y)a..((CAT((CAT 'x') taBCt)) 'y') : XeSTR | YeBAL STR | 
exe Y X..((CAT((CAT 'x') Y) 'X') : XeSTR 

A pattern in SNOBOL/1 is the basic unit through which 
string transformations are accomplished. A pattern can be 
viewed as an expression representing a set of strings. 

A string enclosed by quotation marks is a pattern expres- 
sion representing the set of strings containing one member, 
the string itself. A string name is a pattern representing 
the set of strings containing one member, the string value of 
the string name. A variable name enclosed by asterisks is a 
pattern expression representing the set of all strings of 
basic symbols. A variable name enclosed by parentheses and 


further enclosed by asterisks is a pattern expression repre- 


senting the set of all strings containing balanced pairs of 


*The use of the auxiliary term for the predicate part "SPECS" 
will be discussed shortly. 


110 


parentheses. A variable name followed by a slash and a 
positive integer and enclosed by asterisks is a pattern expres- 
sion representing the set of all strings whose number. of basic 
symbols is given by the integer following the slash. A name 
that occurs elsewhere in a pattern as a variable name is a 
pattern expression representing the same set of strings re- 
presented by the variable name. A name used in this context 
is called a back-referenced name. 

A sequence of patterns of jo been. eenheuei Bak each 
separated by one or more spaces comprises a complete pattern. 
A sequence of pattern expressions represents the set of all 
strings composed by concatenating representative strings from 
each of the sets represented by the component pattern expres- 
sions. This set is restricted in that a string used in 
place of a back reference name must be identical to the 
string used in place of the corresponding variable name. 

A pattern is used.to scan a given object string for the 
existence of one of the strings represented by the pattern. 
If more than one string represented by the pattern occurs 
within the object string, the member M such that (a) each of 
the strings (except the last) concatenated to form M is, from 
left to right, as short as possible and (b) the last string 
concatenated to form M is as long as possible is taken as the 


occurrence of the pattern in the object string. 


111 


Pattern Matching Rules 


STR NAME<n..n'>, STR EXP<s..s'>, PATTERN:SPECS:VAR REFS 
<p..-p':c:v> +> PAT MATCH RULE<n@pe=s.. 
(MATCH_AND_ASSIGN(n',p',Am.s',te','(v)'>3 


Example Pattern Matching Rules: 


X “apc™=..(MATCH_AND_ASSIGN(X, 'ABC', Amw.tat,'', '()")) 

X #NAMEa “,%=..(MATCH_AND_ASSIGN(X, ((CAT 'NAME') ',') 
»Av.'A', 'NAMEcSTR |', '(NAME,)')) 

X ALPHA = BETA..(MATCH_AND_ASSIGN(X, ALPHA, Avw.BETA,'', '()')) 


A pattern matching rule consists of a string name followed 
by pattern, an equal sign, and a string expression. The execu- 
tion of a pattern matching rule results in the following se- 
quence of actions: 


(a) The string value of the string name is scanned for 
the occurrence of the pattern. 


(ob) If the occurrence of the pattern is found 


(i) each string variable in the pattern is 
assigned the vaiue of the substring used 
in matching the variable to the object 
string, 


(ii) the string expression is evaluated (using 
the new values of the string variables), and 


(iii) the occurrence of the pattern in the object 
string is replaced by the string value of 
the string expression and the string name 
is assigned the value of this newly formed 
string. 


(c) If the occurrence of the pattern is not found, no 
action is taken. 


The pattern matching capability of SNOBOL/1 is handled 


in the target language through the function "MATCH_AND_ASSIGN", 


112 


(see Appendix 3.3c) which essentially forms an extended Markov 
algorithm that reflects the same transformation defined by 

the pattern. In the formation of the extended Markov algo- 
rithm, the variable and back reference names are treated as 
extended Markov algorithm string variables. Hence the trans- 
lation of a variable or back reference name is given as a 
constant (see definition of patterns given previously), the 
variable names are specified as extended Markov algorithm 
string variables representing members of one of the sets 
"STR", "BAL STR", and "FIX LN STR" (see the auxiliary term 

for the predicate part "SPECS" in the definition of a pattern) 
defined in Appendix 3.la, and the lists of variable names* 
and their set specifications are passed as arguments to the 
function "MATCH_AND_ASSIGN". The evaluation of the function 
"MATCH_AND_ASSIGN" results in the following actions: 


(a) An attempt is made to match the pattern to the 
object string. 


(>) If a match is found, the values of the variables 
are updated, the value of the string expression 
is computed, the name to which the pattern has 
been applied is updated to its new value, and the 
string "TRUE" is returned. 


(c) If no match is found, the string "FALSE" is re- 
turned, 


*The list of variable names is given by the auxiliary term 
for the auxiliary predicate part "VAR REFS" generated in the 
canonicalsystem for the syntax of SNOBOL/1. This auxiliary 
term is also generated in the complete (unabbreviated) 
canonicalsystem of the translation of SNOBOL/1 and is used 
to specify the translation of SNOBOL/1 as indicated above. 


113 


Input Rules and Output Rules 


PATTERN: SPECS:VAR REFS<p..p':e:v> 
+ INPUT RULE<SYS .READ p..(MATCH_AND_ASSIGN 
(READER#,p',Am.ta',te',(v),'v'))>3 
STRING EXP<s..s'> + OUTPUT RULE<SYS .PRINT s.. 
(PRINTER# ASSIGN. ((CAT PRINTER#) s'))>; 


Example Input and Output Rules: 


SYS .READ #Xe ..(MATCH_AND_ASSIGN(READER#, 'X', Am.'A', 
'xeSTR [','°(X,), 'X,')) 3 
SYS .PRINT REVERSE..(PRINTER# ASSIGN. ((CAT PRINTER#) REVERSE) ) 


An input rule consists of the string "SYS .READ" followed 
by a pattern. An output rule consists of the string 
"SYS .PRINT" followed by a string expression. 

The input and output of strings from some external input 
device is defined in the ccneer language by assuming that 
there are two system variables "READER# and "PRINTER#" that 
contain the initial values of the input and output strings.* 
When a string is input into a program, the value of the system 
variable "READER#" is changed to the string computed from the 
current value by deleting the string to be read in, and the 
values of the string variables in the pattern are updated. 

The pattern matching and updating of variables are handled 


through the function "MATCH_AND_ASSIGN" described previously. 


*The initial values of these variables can be added to the 
initial environment named Ay in the target language evaluator. 


114 


When a string is output from a program, the value of the 
system variable "PRINTER#" is updated by appending the string 


value of the string expression. 


Assignment Rules 
STR NAME<n..n'>, STR EXP<s..s'> + ASSIGN RULE 
<nzs..(n' ASSIGN. s')>; 


Example Assignment Statement: 


REVERSE = X REVERSE..(REVERSE ASSIGN. ((CAT X) REVERSE) ) 


An assignment rule consists of a string name followed 
by an equal sign and a string expression. The execution of 
an assignment rule results in assigning the string value of 


the string expression to the string name. 


Rules 


PAT MATCH RULE<r..r‘> | INPUT RULE<r..r'> | OUTPUT RULE<r..r'> | 
ASSIGN RULE<r..r!> + UNLABELED RULE<r..r'>; 

UNLABELED RULE<r..r'> + RULE<@r..r'>; 

UNLABELED RULE<r..r'>, NAME<n> + RULE<nOr. .OnO}'>; 


Example Rules: 


NAME = NAME REVERSE..(REVERSE ASSIGN. ((CAT NAME) REVERSE) 
L4 NAME = NAME REVERSE... L4; (REVERSE ASSIGN. ((CAT NAME) REVERSE) 
A rule must be prefixed by @ sequence of blastk spaces or 
a name. A name prefixing a rule is called a label and is 
used to identify a rule when the normal order of evaluation 


ia to be interrupted and to be continued at the labeled rule. 


115 


Statements 


NAME<n> > LABEL EXP<n.. .n>; 

STR NAME<n> + LABEL ExP<$n..(LOOKUP. ((CAT '.') n))> 

RULE<r..r'>, LABEL EXP<2..2'>,<m..m'> 

> STM<r..r'>,<r/(2)..r';(GOTO. &')>; 

<r/S(2)..r' => (GOTO, &') ELSE => 'A'>, 
<r/F(m)..r' =& 'A' ELSE => (GOTO. m')>, 
<r/S(2)F(m)..r' =» (GOTO, £') ELSE => (GOTO. m')>, 
<r/F{m)S(2)..r' = (GOTO. £') BLSE => (GOTO. m')>; 


Example Statement: 


L3 REVERSE = “,” NAME REVERSE /(L2) .. 
L3: (REVERSE ASSIGN. ((cCAT((CAT ',') NAME)) REVERSE)); 
(GoTO, .L2) 

A label expression in SNOBOL/1 is an expression whose 
string value is a label. A label can be referenced directly 
by giving the name of a label or by giving a string name whose 
value is a label and prefixing the string name by a dollar 
sign. 


Wy, Ue/ (2), 


A statement consists of one of the strings 
"r/S(2)", "r/F(m)", "r/S(2)F(m)", or "r/F(m)S(2)", where r is 
@ rule and & and m are label expressions. The execution of 
a statement of the form "r/(2)" results in executing rule r 
and then transferring control to the statement designated by 
the label expression 2. The execution ee a rule of the form 
"r/S(2)" results in evaluating rule r and then transferring 
control to the statement designated by the label expression 


2 if the rule (presumably a pattern matching rule or input 


rule) succeeded in matching the pattern in the rule to its 


116 


object string. Similarly, a statement of the form r/F(m) 
results in transferring control to the statement designated 

by m if the execution of rule r failed to match the pattern 

in the rule to its object string. Finally, statements of 

the form "r/S(2)F(m)" or "r/F(m)s(2)" result in transferring 
control to one of the statements designated by & or m if the 
execution of rule r succeeded or failed in matching its pattern 


to its object string. 


Statement Sequences* 


STM<s..s'> > STM SEQ<s..s'>; 
STM SEQ<q..q'?, STM<s..s'> + STM SEQ<qts..q'3s'>; 
STM SEQ<q..q'>, STRING<s> +> STM SEQ<qdes..q'>,<a#sbq..q'>; 


Example Statement Sequence: 


L& REVERSE = X REVERSE L4: (REVERSE ASSIGN.((CAT X) 
REFERSE)); 
SYS .PRINT REVERSE (PREINTER# ASSIGN. ((CAT 


PRINTER#) REVERSE) ); 


A statement sequence consists of a list of statements 
each on a new line. The statements are executed in order 
unless a statement explicitly specifies a transfer of control. 
Arbitrary character strings prefixed by an asterisk can be in- 
serted among statements. The character strings provide com- 


ments for the programmer and are not evaluated. 


*The symbol "&" denotes a new line. 


117 


SNOBOL/1 Prograns* 


STM SEQ:STR REFS<q..q':8 >, NAME<n>, LIST: BYS:CORR NULL LIST 


<8 iv, 22> + SNOBOL PROGRAM<q END n..LET v,=2 IN (GoTO. 'n'); q'> 
Example Program: 
Ll sys «READ #Xe 
L2 Xx aNAMEs 4,” = /s(L3)F(L4) 


L3 REVERSE = “,” NAME REVERSE /(L2) 
L4 REVERSE = X REVERSE 

sys «PRINT REVERSE 
END Ll 


Translation: 


LET X,NAME,REVERSE = 'A',tA','A' 
IN (GoTo, .L1); 
Ll:  (MATCH_AND ASSIGN(READER#,'X',An.'A',XeSTR |', '(X,)'))s 
L2:  (MATCH_AND”ASSIGN(X,((CAT 'NANE) ','),Am.'A', 
_ 'NWANEeSTR eee 
=> (GoTo. .L3) ELSE => (GOTO. .L4); 
L3: (REVERSE ASSIGN. ((CA?P ((CAT ',') NAME)) REVERSE)); 
(GOTO. .L2); 
Li: (REVERSE ASSIGN. ((CAT X) REVERSE)); 
(PRINTER# ASSIGN. ((CAT PRINTER#) REVERSE) ); 


*Like the list of variable names, the list of string names 
used in a SNOBOL/1.is generated in the canonical system for 
syntax and is used in the canonical system for the transla- 
tion to form the list of bound variables for the target 
language transiation of a progran. 


The predicate "LIST: BYS:CORR NULL LIST" names a set of 
ordered triples, where the first element of each triple is 

a list of names (e.g., X,Y,X,ALPHA,Y,), the second element 
is a name list containing one occurrence of each name in 

the first list (e.g., X,Y,ALPHA), and the third element is 

a list of null strings with the same number of elements as 
the second list (e.g., "A","A","A"). This predicate is used 
to set the list of string names in a program to bound vari- 
ables each with the initial value of a null string. 


118 


A SNOBOL/1 program consists of a statement sequence 
followed by a statement of the form "END n", where "END" is 
a label and "n" designates the label of some statement in 
the statement sequence. The execution of a program begins 
by initializing the string values of the string names in the 
program to null and tien executing the statements in the pro- 
gram beginning with the statement labeled by "n". 

The example program above reads in a string from the 
input device and outputs the string computed from the input 
string by reversing the order of each substring separated by 
@ comma. For example, if the string "HESSE, KAPKA, MANN" 
is on the input device, the string "MANN, KAFKA, HESSE" is 


printed on the output device. 


Context-Sensitive Requirements on the Syntax of SNOBOL/1 


There are a few context-sensitive requirements on the 


syntax of SNOBOL/1: 


(a) The variable names in a pattern must each be 
differents 


119 


(>) 


(ec) 


The. back-reference names in a pattern must be 
identical to the variable names and atreer eat from 
the string names. 


The labels in a program must. each be different and 
each reference to a label in a label expression 
must refer to a name that actually occurs as a 
label. 


These requirements are specified in the canonical system for 


the syntax of SNOBOL/1 by specifying with each construct. 


(a) 


(d) 


the lists of names used as string names, variable 
names, and back reference names (productions 3 of 
Appendix 3.1), 


the lists of names used as labels (production 11.3) 
and names used to refer to labels (production 12.1), 


and specifying 


(a) 


(bd) 


(ec) 


that the list "r_" of variable names in a pattern 
must contain names each of which is different (the 
premise "DIFF NAME LIST<r >" in production 6.8), 


that the list "r," of back reference names in a 
pattern must be contained within the list "r_" of 
variable names and that the list “ea of string 
names in a pattern must be disjoint from the list 
"yr " of variable names (the premise "L1:L2:INTERSEC 
<rpiryiry>,sroiry:A>" in production 6.8), and 


that the list of labels in a program must contain 
names each of which is different and that each 
label reference must be contained in the list of 
labels (production 14). 


The addition predicates "DIFF NAME LIST" and "L1:L2:INTERSEC" 


are defined at the end of Appendix 3.1. 


This chapter has attempted to describe in detail the 


syntax and semantics of SNOBOL/1. It is intended that a 


reader, having digested this chapter, would have sufficient 


120 


knowledge of SNOBOL/1 and its formal definition to be able 
to use the compact, formal definition to answer further 

questions concerning the syntactic legality or meaning of 
a given SNOBOL/1 construct. It is hoped that this chapter 


has served that objective. 


121 


CHAPTER V 


A SPECIFICATION OF THE SYNTAX AND SEMANTICS 
OF ALGOL/60 


This chapter exercises the formal system presented in 
this dissertation to specify the syntax and semantics of 
ALGOL/60, as defined in the official ALGOL/60 report edited 
by Peter Naur. 7? The intent of this chapter is not only to 
explicate the formal specification of ALGOL/60, but also to 
relate the techniques used in the formal specification of 
ALGOL/60 to other languages and to compare the formal system 
presented here to other methods of language specification. 

A knowledge of ALGOL/60 is assumed in this chapter. 

It is surprising that, although ALGOL/60 is the official 
publication language of the Association for Computing Machinery 
and is accordingly widely~-publicized, the author knows of no 
implementation of the complete language. Probably the most 
important factor in this circumstance is the complexity of 
ALGOL/60. Indeed, in writing this chapter I frequently found 
myself in the difficult situation of first attempting to under- 
stand ALGOL/60 and then attempting to characterize the language 
with the formal system. There are many interrelated program 
constructions and a complicated variety of restrictions on 
programs that make the language difficult to understand and 
define. Nevertheless, as an example of the formal system, 


applied to a somewhat complex computer language, a specification 


122 


of the syntax and semantics of ALGOL/60 is presented in Appen- 


dix 4,*# 


Previous Work by Peter Landin: 


In his paper“? "A Correspondence Between ALGOL/60 and 
Church's Lambda Notation," Peter Landin described the semantics 
of ALGOL/60 in terms of a modified form of Church's A-calculus, 
called “imperative applicative expressions” or "IAEs". The 
target language presented here is similar to Landin's impera- 
tive applicative expressions in that the A-calculus was 
augmented to directly handle assignment and transfer of 
control features of ALGOL/60. The target language differs 
from imperative applicative expressions in that (a) the 
mechanism to handle transfer of control here is different 
from that of Landin, and (b) Landin's (SECD) machine to 
evaluate imperative applicative expressions is specified by 
a A-calculus expression, whereas the machine to evaluate 
target language expressions here is specified by an extended 
Markov algorithm. 

The specification of the semantics of ALGOL/60 given 
here is heavily based on Landin's definition. On the other 
hand, the dissertation here not only includes a specification 
of the semantics of ALGOL/60, but also a specification of 


syntax and a definition of the primitive funetions used in 


*The specification of character spacing and of the use of 
exponents in numbers is not included. 


123 


ee ee ea er ee ee ee er eg ee ee eee ne ee ee eae Mag eg ae eee ee 


specifying the semantics. The primitive functions used to 
specify the semantics of ALGOL/60 are defined only by example 


in Landin's paper. 


The Syntax of ALGOL/60 


The canonical system specifying the syntax of ALGOL/60 
is specified in Appendix 4.1. The first term in each speci- 
fied term tuple describes some string in ALGOL/60. If the 
auxiliary predicate parts and terms are deleted from this 
specification, Appendix 4.1 can be viewed as a partial (context- 
free) specification of the syntax. A context-free specifica- 
tion of ALGOL/60's nies exists in the ALGOL/60 report and 
the specification of Appendix 4.1 closely parallels the 
specification in this report. Although it does not completely 
specify the syntax of the language, the context-free specifi- 
cation of ALGOL/60 is fairly straight-forward and the presen- 
tation of the canonical system of ALGOL/60 will therefore 


focus on the context-sensitive requirements. 


Context-Sensitive Requirements on the Syntax of ALGOL/60 


There are myriad context-sensitive requirements on the 
syntax of ALGOL/60. Among these requirements are 


(a) The type of each identifier in a program must be 
declared. 


(bo) An identifier cannot be used in conflicting con- 


texts in the same block. There are many variants 
of this requirement. For example, an identifier 


124 


iad 


used as a real variable in a block cannot be used 
as a boolean variable, an array identifier, a pro- 
cedure identifier, or a switch identifier. 


(c) Any use of an array identifier must occur with a 
subscript list of the same dimension as that of 
the bound pair list in the array declaration. 


(a) The bound pair list in an array declaration can 
depend only on variables that are non-local to the 
block in which the array declaration is given. 


(e) All statement labels in a block must be different. 

(ft) The uses of actual parameters in a function desig- 
nator must be compatible with the uses of the cor- 
responding formal parameters in the procedure 
declaration. There are many, many variants of 
this requirement. For example, an actual parameter 
that is declared to be a real variable cannot cor- 
respond to a formal parameter that is used as 4 
boolean variable, an actual parameter that is a 
procedure identifier must correspond to a formal 
parameter that is used with arguments that are 
consistent with the procedure declaration, and an 
actual parameter that is an arithmetic expression 
cannot correspond to a formal parameter that is 
called by name and assigned a value in the procedure 
declaration. 

The context-sensitive requirements on the syntax of 
ALGOL/60 occur in many other computer languages besides 
ALGOL/60. The restriction (a) that the type of each identifier 
must be declared occurs in many computer languages. For 
example, in PL/1 each occurrence of an identifier used to 
name an object must be declared, either explicitly, contextually, 
or implicitly. An explicit declaration of an identifier is 
given through a DECLARE statement, whereby an identifier is 
given an attribute restricting the use of the identifier to 


statements operating on certain classes of data, e.g., fixed 


point numbers, character strings, or files. A contextual 


125 


declaration of an identifier is given when an identifier 
occurs in a context where only one class of data objects can 
occur, e.g., in the statement "GET FILE (X) DATA" the identi- 
fier yn is contextually declared as a member of the class 
file in that only a file name can occur after the string "GET 
FILE" in a GET statement. An implicit declaration of an 
identifier is given when an identifier is associated with 
other declared identifiers (e.g., in the statement 

"nT = A #® 3B", if "A" and "BY" are declared as fixed point num- 
bers, the identifier T may be implicitly declared as a fixed- 
point ninbele). Programs not specifying a unique declaration 
for each \identifier are tllegal. 

The westeieiiay (b) that identifiers cannot be used in 
conflicting contextx occurs in almost every language where dif- 
ferent classes of data objects are distinguished. For example, 
although PL/1 allows some identifiers to be used in different 
contexts, many contexts of declared identifiers are considered 
illegal, e.g., if "X" is explicitly declared as a bit string, 
the statement "GET FILE (X) DATA" is illegal since the GET 
statement contextually declares "X" as a file. 

The restriction (e) that all statement labels in a block 
must be aitterent occurs in winbae every language allowing 
statements to be labeled and control to be passed to a labeled 
statement. The labels must be different in order for the 
destination of the transfer of control to be unique. For 
example, in Fortran IV no two statements may be labeled with 
the same statement number, 


126 


The restriction (f) that corresponding actual and formal 
parameters must be compatible likewise occurs in many lan- 
guages and can become complicated, especially in languages 
allowing nested procedure definitions and applications like 
ALGOL/60. 

The author knows of only one major computer language 
where a complete formal specification of its syntax has been 
given. In particular, the simulation language GPSS has been 
specified completely by Donovan, > using canonic systems. 
Otherwise, the syntax of many computer languages has been 
specified either informally or has been partially formalized, 
usually with a context-free grammar. 

Before discussing the specification of the context- 
sensitive requirements on the syntax of ALGOL/60, the reader 
is reminded that the auxiliary predicate parts and terms in 
@ production generally specify the lists of identifiers, 
labels, variables, etc., that are used within the source 
language string specified by the first term in the production. 
These lists will be referred to repeatedly in the productions 


to follow. 


Specification of the Requirement that the Type of Each Variable 
Must be Declared: 


Consider the (abbreviated) production* from the canonical 


*The productions given in the text will generally be only por- 
tions of the corresponding productions given in Appendix 4. 
Portions of productions are given in the text to illuminate 
better the particular construction under discussion. An expli- 
cation of the complete canonical system for ALGOL/60 will be 
given later in the chapter. 


127 


system of the syntax of ALGOL/60: 
ID<i> + REAL VAR:R VARS<i:i,>; 


If "i" designates a string that is an identifier, the term 
tuple "<i:i,>" designates a pair where the first element is 
an identifier used as real variable, and the second element 
designates the addition of the identifier to the list of 

identifiers used as real variables in a program. Consider 


also the production 
IDLIST<&2> + TYPE DEC:DEC R VARS<REAL 2:2,> 3; 


If "2" designates a string that is a list of identifiers, 

the term tuple “<REAL 2£:2,>" designates a pair where the first 
element is an ALGOL/60 declaration of a list of identifiers 

as real variables, and the second element designates the addi- 
tion of the list of identifiers to the list of identifiers 
declared as real variables. 


Next consider the production 


STM SEQ:R VARS<siv.?, DEC SEQ:DEC R VARS<div, 5? > 
. . . oyt 
L1:L2:REL COMP<v ivi giv)? 
> BLOCK:R VARS<BEGIN d;s END:v)> 3; 


Here, if 
(a) "s" is a statement sequence with a list Te of 
identifiers used as real variables 
(o) "a" is a declaration sequence with a list "v__" of 


identifiers declared as real variables = 


128 


(c) "“v'" is the list computed from "v_" and 
fofming their relative complement’ (i.e., i - Vv 


then 
(a) "BEGIN d;s END" is a block with a list "v'" of 
identifiers that are used as real variablés in the 


block but not declared within the block 


Finally, consider the production 
PROGRAM STR:R VARS<p:A> + ALGOL PROGRAM<p>; 


Here, if (a) "p" is a string that is in the form of a program 
and (b) the list "R VARS" of identifiers that are used in the 
program as real variables but are not yet declared is given 
as null, then the string "p" is specified as a bone fide legal 
ALGOL progran. 

In this manner (a) each identifier in a program used as 
a real variable is added to the list of used real variables, 


(bd) each identifier declared as a real variable is added to 


the list of declared real variables, (c) each identifier de- 
clared in a block as a real variable is removed from the list 
of identifiers used as real variables, and (d) a sortie is 
specified as a legal program only if the list of used (but 


as yet undeclared) real variables is given as null. 


Specification That Identifiers Cannot be Used in Conflicting 
Contexts: 


Consider the following production 


129 


17 ESTAR Seg Oy coe is ne uin oee Lat 


STM SEQ:R VARS:B VARS<s:v_:v,>, DEC SEQ<d>, 
DISJ ENTRY LIsTs<(v_)(v,)> + BLOCK<BEGIN d3s END>; 


where the predicate "DISJ ENTRY LISTS" specifies a set con- 
sisting of one or more identifier lists each enclosed in 
parentheses such that each list is disjoint from the others. 
If al and "vi" specify the lists of identifiers used re- 
spectively as real variables and boolean variables, in a 
statement sequence, the premise "DISJ ENTRY LISTS<(v_.)(v,)>" 
insures that the string "BEGIN a; s END" is a legal block 


" 


only if the lists ia and "y_" 


p are disjoint, i.e., not used 


in conflicting contexts. 


Specification That Actual and Formal Parameters Must Be 
Compatible: 


The requirements on the uses of actual and formal para- 
meters of ALGOL/60 procedures is complicated. For example, 
let "P(X,A)" be a declared procedure with two formal parameters 
"xX" and "A", where in the declaration of "P", "X" is used as 
a real variable and "A" is used as an integer array of dimen- 
sion three. The function disignator "P(3.1,Q)", where "Q" 
is a declared integer array of dimension three would consti- 
tute a legal activation of the procedure "P", whereas the 
function designator "P(TRUE,Q)" would not be legal since the 
type "REAL" of "X" and the type "BOOLEAN" of "TRUE" are not 


compatible. 


130 


To specify the context-sensitive requirements on proce- 
dures, a number of additional predicates are defined. For 
simplicity, in the discussion to follow I will assume that 
ALGOL/60-has only three data types: real variables, boolean 
variables, and integer arrays. Consider the following pro- 
ductions: 

DIMM<1> 3; 

DIMM<m> + DIMM<ml>; 

SPEC<REAL> ,<BOOLEAN> 3; 

DIMM<m> + SPEC<INTEGER ARRAY(m)>; 

SPEC<s> > SPEC LIST<s>3 

SPEC<s>, SPEC LIST<2> + SPEC LIST<2,5>;3 
Here the predicate "SPEC" specifies a set comprising the 
strings {REAL BOOLEAN INTEGER ARRAY(1) INTEGER ARRAY(11) 
INTEGER ARRAY(111) ...}, where each string specifies the use 
of some formal parameter in a procedure declaration. The 
predicate "SPEC LIST" specifies a set where each member is 
a string of parameter specifications each separated by a 
comma. 

For example, if "P" is a procedure declared as above, 
the specification list for the formal parameters of "P" would 
be "REAL,INTEGER ARRAY(111)",. Similarly, if "P(3.1,Q)" and 
"P(TRUE,Q)" are function designators where "Q" is declared 
as an integer array of dimension three, the specification 
list for "P(3.1,Q)" would be "ARITH EXP,INTEGER ARRAY(111)" 
and the specification list for "P(TRUE,Q)" would be "BOOL 
EXP, INTEGER ARRAY(111)". In the specification of the syntax 


of ALGOL/60, a predicate "SPEC MATCH" is defined. The ordered 


131 


pair "<ARITH EXP,INTEGER ARRAY(111):REAL,INTEGER ARRAY(111)>" 


is a member of this predicate, and thus, by using this predi- 
cate as a premise in the canonical system for ALGOL/60, the 
function designator "P(3.1,Q)" is allowed as a compatible 
function designator with the above indicated declaration of 
"p", On the other hand, the ordered pair "<BOOL EXP,INTEGER 
ARRAY(111):REAL, INTEGER ARRAY(111)>" is not a member of this 
predicate, and thus the function designator "P(TRUE,Q)" is 
not allowed as a compatible function designator for "P", 

Since the number of data types in ALGOL/60 is much greater 
than the number of types assumed in the examples just given, 
the actual specification of the context-sensitive requirements 
is much more complicated than indicated in the previous para- 
graphs. A detailed discussion of the complete canonical 
system specification of the context-sensitive requirements 


on ALGOL/60 procedures is given at the end of this chapter. 


The Semantics of ALGOL/60 

It seems that much less work in computer science has been 
directed to formalizing semantics than in formalizing syntax. 
While many methods for characterizing (at least in part) the 
syntax of computer languages have been successfully developed, 
few methods for characterizing semantics have reached a 
development where entire languages have been characterized. 
An application of the Ascalculus has been used by Peter Landin® 


25 


and John Wozencraft to characterize respectively the seman- 


132 


1 


tics of ALGOL/60 and the classroom language PAL. The charac- 
terization Oe séwanticn given in this dissertation is in 
part based on these efforts. 

A quite different approach to characterizing semantics 
has been taken by the IBM Vienna laboratory, which has under- 
taken the formidable task of characterizing the semantics of 
PL/1. This group has used portions of LISP, the predicate 
calculus, set theory, and other constructs of their own inven- 
tion to characterize the semantics of PL/1. Their work has 
been described in several lengthy IBM technical reports. A 
judgment of the utility of their approach awaits a more 
digestible presentation of the formal system and the tech- 
niques used within the formal systen. 

The specification of the semantics of ALGOL/60 in terms 
of the target language presented here is given in Appendix 
4,2, Much of the semantics of ALGOL/60,-e.g., arithmetic 
expressions, boolean expressions, designational expressions, 
conditional statements and statement sequences, are straight- 
forwardly defined in the target language and in part. have 
been discussed in previous chapters. I will therefore focus 
the discussion of this chapter on some constructs in ALGOL/60 
whose semantics are not quite as obviously expressed in terms 
of the target language. 

The table on the following pages lists several example 
ALGOL/60 expressions and their translations into the tapeot 
language. In the discussion to follow, the reader may find it 
helpful to refer to these examples, 


133 


EXAMPLE ALGOL/60 EXPRESSIONS AND THEIR TRARCLATIONS 
INTO THE TARGET LANGUAGE 


| systactic Type | ALGOL/60 Expression : Translation into the Terget Lanquage 
IDs Al _ : 


"65" 
aun (mgcate °65') 

sun (o(TRANS_INT °65',TRAMS_FRAC *32')) 

ID a 

ID (XZ 'A') e0if X is m formel parameter called by name 
van a 

VaR afi,x) (cer_au(f{comv_to_rat 2"), (comv_to tm? x}e3)) 

PCR DES P (P fat) 

PCN DRE Q(X,7, 202) (Q(ae.x,an.¥,An.(e(Z,2)))) 

ARITH EXP AtBec (+(a,o(B,C))) 

ARITH EXP IP B THEE O ELSE 1 [|B <> ‘'O' ELSE => ‘1° 

DBs EXP ALPRA «ALPHA 

DES EXP 009 7) 

DES BxP s[{x) ((GxT_BL(comv_ro_rvt x, S)) *4") 


COMMERBT STH COMMENT THIS IS ‘at 
A COMMEZT 


Goto STN GO TO 009 (coro. .9) 


ASOT STH Poet LET »=(CONY_TO_INT X) IN (P# ASSIGH. =) 
If P is an integer procedure identifier 


ASOT STN Ate Bis X LET e=(CONV_TO_IBT ZX) IN LET aeB IN (a ASSIGN. +); 
LET cea IB (a ASSIGN. 7) 

If A end B are integer vars 

FOR LIST EL X STEP 1 UNTIL 5 Aw STEP(D#,.X,4. "1°, . 45°) 


FOR ste FOR ¥=1,2 DO ¥:=¥Voi} (POR(Y,DELAY_caT[>*. 12° ae. '2°Y,LET e=(comy TO ret(o(V,1))) 
IN LET aeV IW (a ASSIGN. +) 


UNcoED StH ALPHA: GO TO 009 ALPHA: (GOTO. .9) 


COND STH IP? BeTRUE {=(B, "TRUE'}) SH (COTO. .ALPHA) ELSE =p ‘4° 
TEEN GO TO ALPHA 


TYPE DEC REAL X,¥,2 XpYeZ eo MATS TAT TAT 
TYPE prc OWN REAL X,Y,2 X,Y,Z = X#1,Y#1,742 


ARRAY DEC REAL ARRAY Al1:10, fa = (wame_tistifi' 14 02°, 2 $fi07, 20) 
1710 


amRar DEC OVE REAL anna? Ae (neser_orat (ann Fi JG PEi0+ 0%) 
A(1:10,1:16 


sv pac BVITCH S:eALPHA,OO9 Js = (ruDEx_List('1* [aLPua,. 9p) 


PROC DEC REAL PROCEDURE P(X,Y}P(X,Y) = LET Pe,x = "A", {UNSHAKE (X °A*)) 
VALUE X: P:ek+Y IN LET w=(CONV_TO_REAL (¢(x,(Y 'A'))) 
Im (P#@ aSSTGN. +); Pe 


BEGIN REAL X,Y; LET A&C X,ye's",°A* 
XY se 3; Im LET e*(CONY_TO_REAL ‘3°) Im LET 2eY IN (a ASSIGN. * 
ats Jer LET aX IN (a ASSIGN. * 
ead LET w=(COMV_TO REAL (0(X,¥)) IN LET wA TW (a ASSION. * 


ALOOL PROGRAM { BEGIN REAL A,3; LET REC A,B, F(X, Yet’, AS LET FA, Kok", (UMCHARE (X *A°)) 
EAL NOUS F(x,y); IB LET COMV_TO_REAL(@(X,(/((¥ "a"),*2°)))) 
VA; In (re assTowy «5 Pe 
F re 107/23 In LET v=(COMY_TO_REAL '3') IN LET ow I® {o-ASSIGN. ©); 
A: 3; LET w=(COMV™TO_REAL (o(A,P(Aw.'S* 18.A)))) 
Bore AOP(4, A); IN LET a=B I# (0 ASSIGN. ©) 
mad 


134 


Primitive Functions Used to Define the Semantics of ALGOL/60: 


Appendix 4.3 defines the primitive functions used in 
defining the semantics of ALGOL/60. Appendices 4.3a and 4,.3b 
define miscellaneous primitives, like the function "NEQ" for 
negating a boolean value, the function "HD" for computing the 
head of a list, and the function "ABS" for computing the 
absolute value of a number. Real numbers in ALGOL/60 are 
represented in the target language by their fractional equiva- 
lent. <A fraction in the target language is a string of the 
form "xDy", where x and y represent respectively the numerator 
and denominator of the fraction. For example, the real number 
"1.5" in ALGOL/60 is translated into the target-language 
string "3D2" denoting the traction three-halves (3 Divided by 
2). Appendix 4.3c defines the primitives "TRANS INT" and 
"TRANS FRAC" for converting real numbers to their fractional 
representation and the primitives "CONV_TO REAL" and "CONV_ 
TO_INT" for converting integer numbers to real numbers and 
real numbers to integer numbers. Appendices 4.3d and 4.3e 
define the arithmetic and boolean primitives. 

Appendices 4.3f and 4.3g define the primitives used in 
defining the semantics of for statements and arrays and will 
be discussed later in the text. 

Primitive functions similar to those given for ALGOL/60 
can be used to define the semantics of many languages used 
for numerical processes. For example, in FORTRAN IV, the 


arithmetic and boolean primitives almost exactly parallel 


135 


those for ALGOL/60. Although FORTRAN IV allows the user to 

(a) specify one of two precisions for real number arithmetic 

and (b) specify arithmetic for complex numbers, these facilities 
can be readily specified in the target language by (a) defin- 
ing a primitive that converts target language fractions to the 
desired precision as real numbers and (dv) defining the arith- 
metic operators for coupiex numbers in terms of those given 

for real numbers. Similarly, the FORTRAN IV facilities for 
arrays and DO statements closely parallel the ALGOL/60 facili- 


ties for arrays and for statements. 


Assignment of Values to Variables and Procedures: 


Consider the following ALGOL/60 assignment statements: 


where "X" is an integer variable, "A" is a real variable, and 
"F" is a real procedure identifier. The corresponding target 


language expressions for these statements are: 


LET 7 = (CONV_TO REAL X) IN LET a = A IN (a ASSIGN. 1) 


LET s = (CONV_TO_ REAL X) IN (F# ASSIGN. 1) 


LET 1 = (CONV_TO_REAL X) IN (F# ASSIGN. 1); . 
LET a = A IN (a ASSIGN. 7) 


136 


The expression on the right side of an assignment state- 
ment must be evaluated only once. Therefore, the translation 
of the right-hand expression is evaluated once and is linked 


"a" and the value of nm is used in 


with the dummy variable 
each target language assignment expression. The primitive 
"CONV_TO_REAL" is applied to "mn" before the assignment to 
convert the vais of "nr" to a real number. 

Assignments in the target language can only be made to target 
language variables. The ALGOL/60 variables in the left side of the as- 
signment statement are linked with the dummy target language variable 
"a" to handle the case where the ALGOL/60 variable is a formal 
parameter called by name and the ALGOL/60 variapie must be 
translated into a target language expression that is not a 
variable. (This point will be discussed shortly.) By linking 
the dummy variable a with the translation of expression re- 
presenting the ALGOL/60 variable, an assignment to a will 
also result in an assignment to the corresponding ALGOL/60 
variable. 

The assignment of a value to a procedure in a procedure 
declaration is handled by affixing the mark "#" to the proce- 
dure identifier and assigning the value of the right-hand 
expression to this newly formed identifier. The "#" is affixed 
to the identifier to avoid conflicts with the use of the pro- 
cedure identifier in a recursive call to the procedure. In 


the translation of the entire procedure declaration, the 


137 


translation of the last statement in the declaration is 
followed by the statement "F#", where F is the procedure 
identifier. Thus the evaluation of the procedure will return 


the value currently assigned to the procedure identifier. 


Parameters Called by Name and Called by Value: 


Consider the following ALGOL/60 procedure declaration: 


PROCEDURE F(X,Y); VALUE Y; 


BEGIN 
Ys] YY 
X := Yuy3 
END 


In this procedure declaration the formal parameter "X" is 
called by name and the formal parameter "Y" is called by 
value. If "A" and "B" are real numbers whose current values 


are "1" and "2", the evaluation of the procedure statement 
F(A,B); 


results in changing the value of "A" to "4" while leaving the 
value of "B" unchanged. 

Next consider the following target language translations 
of the procedure declaration given above and procedure state- 
ment "F(A,B)": 


LET F(X,Y) = LET Y = (UNSHARE (Y 'A')) 
IN LET nm = (CONV_TO_REAL (+(Y,Y))) 


IN LET a = Y IN (a ASSIGN. 7); 
LET 7 = (CONV_TO_REAL (#(Y,Y))) 
and IN LET «0 = (X 'A') IN (a ASSIGN. 1) 


F(AT.A, At.B) 


138 


Here, the translations of the actual parameters "A" and "B" 
are given as functions mapping the dummy variable "nr" into 

the variables of "A" and "B", In the evaluation of the pro- 
cedure statement "F(A,B)", the function "An.B" will be applied 
to the null string (causing the evaluation of "B") and the 
function "UNSHARE" (Appendix 4.3a) will be applied to this 
value (causing the formation of a new cell in the store for 
the value of "B"., Thus subsequent assignments to the formal 
parameter "Y" will not result in changing the value of "B". 

On the other hand, the function "UNSHARE" is not applied to 


"x" and the assignment of a value to "X" will result in 


changing the value of the corresponding actual parameter "A". 


Lists in ALGOL/60: 


In defining the semantics of ALGOL/60, it will be con- 
venient to define primitive functions operating on lists of 


strings. I will use the notation 


where the Si» l<i<n, are strings, to denote a list. If 
Xj)> Ky» eee x, are expressions whose values are the strings 


Soo see oSns the expression 


(1) ((cat ... ((cat ((cat ((cat x, ) Pag ©) X,)) re 2 ae ee x) 


will result in forming the list 


139 


ae a ne 


The concatenation of expressions to form lists will occur 
frequently in the formal definition of ALGOL/60. For conven- 
lence, I will generally omit the explicit specification of 
the concatenation of the component expressions of a list and 
write list expressions of the form (1) in the alternate nota- 


tion 


[Xie Sop. met Gk] 


Arrays and Switches: 


An array in ALGOL/60 is treated in the target language 
as an indexed linear list, where the number of elements in 
the list equals the number of elements in the array. For 


example, an array with a bound pair list 
[2:2,1:3] 

is. translated into the string 

(1,2,A),(1,2,4),(1,3,4),(2,1,4),(2,2,4),(2,3,4) 


where the symbol "A" specifies an initial null value for each 
element of the array. The translation of arrays into lists is 
handled through the function "MAKE LIST" (Appendix 4,3g), which 
converts the bound pair list of the array into a linear list 

of array elements each with an initial null value. An element 


of an array is obtained through the function "GET_EL", 


140 


(Appendix 4.3g), which, given a subscript list and an array 
identifier, obtains the appropriate array element. The 
elements of an array are updated with new values through the 
function "RESET LIST", which resets the value of one of the 
array elements in the array list. 

Switches are also treated as linear lists. For example, 
a switch with a switch list "L,M,N" is translated into the 
target language string "aan. -L) (2,0. -M),(3,Am. .N)} the 
elements of the target language list are given as dummy 
variable functions so that an element of a switch list is 
not evaluated unless the element is selected by a designa- 
tional expression. The translation of switches into lists 
is handled through the primitive function "INDEX_LIST" (Ap- 
pendix 4.3g), which forms an indexed list of switch elements. 
An element of a switch list is obtained by applying function 
"GET_EL" to the switch list and then applying the selected 
element to the null string. This application results in 


forming the proper label-closure for the label. 


Own Variables: 


Consider the following outlined ALGOL/60 program: 


BEGIN 
REAL X,Y,2; 
PROCEDURE F(A); BEGIN OWN X;  ... END; 


END 


141 


and its target language translation 


LET X#1 = 'A! 
IN LET REC X,Y,Z,F(A) = 'A','A','A', LET X =X#1 IN... 
IN 


The variable "X" in the ALGOL/60 procedure "F" is an own 
variable, and hence on successive calls to the procedure "F" 
the value of "X" is not re-initialized to a null value but 
maintains the value last assigned to "X" on the previous call. 
In the target language translation of the program, a new 
global identifier "X#1" is created, and on each call to "F" 
the value of "X" is set to the value of "X#1". In this manner 
an assignment to the value of "X" will also result in an 
assignment to "X#1". Since "X#1" is global to the entire 
target language expression, "X#1" will maintain the value 

last assigned to "X" and subsequent calls to "F" will result 
in resetting "X" to its last assigned value. 

The mark "#" and positive integer are affixed to the 
global own identifiers so that these identifiers will not 
conflict with other identifiers in the target language 
expression. 

Own arrays are treated similarly to own variables in 
that the own array identifiers are coupled with corresponding 
global identifiers. The global array identifiers are ini- 
tialized with null values. Upon each entry to a block with 


an own array, 


142 


nape ee aim eed 


(a) the value of the global array identifier is updated 
to the value computed from the current value of the 
global identifier by (1) retaining the values of 
the array elements whose indices, as specified by 
the current value of the bound pair list, occur in 
the array list for the global identifier, and (2) 
setting to null the values of the array elements 
whose indices do not occur in the array list for 
the global identifier, and 

(bo) coupling the value of the own array identifier with 
the value of the corresponding global array identi- 
fier. - 


‘ 
Thus, upon the first entry to the block, each element of the 
own array will be given as null. Since updating the value 

of the local own array identifier will also result in up- 
dating the value of the corresponding global array identifier, 
subsequent entry to the block will result in resetting the 
values of the previously given elements of the own array 
identifier to their previous values and setting the value of 
each array element not included in the previous bound pair 
list to null. 

Own variables and own arrays have generally caused prob- 
lems for those implementing languages with own variables in 
that special programs and storage areas have been needed to 
properly implement own variables. The above mechanism for 
handling own variables in the target language is quite 
straightforward and avoids the complexity generally associated 


with own variables 


Goto Statements: 


A statement of the form "GO TO L" in ALGOL/60, where L 


143 


ge eS Oe eg te eg ee Eg GTR ANE ge! Ae geet 
Fee ee Oe Tee ee mn Ete 7S . aaee  e ee er ee? ¥ 


is a label reference, will result in interrupting the normal 
order of evaluation and continuing by evaluating the statement 
labeled by L in the same sequence or in the first encompassing 
block containing a statement with @ label L. The mechanism 
for transferring control to a target language expression in 
the same or an shebupaneiag sequence has been discussed in 
the chapter III. 

On the other hand, a more complicated situation for 
transferring control occurs when a label is passed as an 
argument to a procedure.* For example, consider the procedure 


statement 
F(L) 
and the procedure declaration 


PROCEDURE F(X); LABEL X; 
BEGIN 


GO TO X; 


END 
Since in the target language, the procedure statement is 
translated as 


F(\n. .L) 


where the )-closure for "An. .L" is evaluated relative to the 
pea aod eat CRE DE ee ee aE 


*Formal parameters that are labels called by value are excluded 
according to the ALGOL/60 report. 


144 


environment within which the procedure statement occurs and 


the GO TO statement is translated as 
(Goto. (xX ta't)) 


the label-closure for X will refer to the labeled statement 
in the block in which procedure statement occurs (or to a 
labeled statement in an encompassing block) and the environ- 
ment given by the label closure will refer to the environment 
of the block specified at the time when the procedure state- 
ment was evaluated. 
Furthermore, consider the ALGOL/60 program: 
BEGIN INTEGER A,B; 
PROCEDURE F(I,X); LABEL X; VALUE I 
BEGIN M: B 3:= Btls; 
Io := I+1; 
IF B= THEN GO TO 11; 
IF B=3 THEN GO TO X; 


IF B=2 THEN F(I,X); 
IF B=1 THEN F(I,M) END F; 


Ll: A := A@A 
END 
Here F is a recursive procedure that is called three times. 
On the second call to F the local label M is passed as an 
argument; the label-closure for M will specify an environment 
within which the value of I is 1. On the third call to F the 
GO TO statement "GO TO X" will result in resetting the environ- 
ment within which the value of I is 1, and upon exiting from 


the procedure the value of I will be 2, and not 3. 


145 


Recursive Definitions: 


ALGOL/60 allows the declaration of variables, arrays, 
switches, and procedures that can depend on each other. For 
example, the following declaration sequence can occur within 
@ block 

REAL PROCEDURE H1(X1); IF Xl=0 THEN 24 

ELSE X1#H2(X1-1); 
REAL PROCEDURE H2(X2); IF X2=0 THEN 1 
ELSE X2eH1(x2-1) 
These declarations constitute a simultaneous recursive defini- 
tion of the factorial function (e.g., the value of the function 
designator "H1(4)" is "2k"), 

If El, E2, and S are statements, and Hl and H2 are proce- 
dure identifiers that are (possibly) defined simultaneously 
recursive, the ALGOL/60 block 

BEGIN 

REAL PROCEDURE H1(X1); £1; 

REAL PROCEDURE H2(X1); E2; 


8 
END 


can be correctly defined by the target language translation 
(1) (aw. (082, (0H2.8 (HD w)) (TL e)) (x? aa. aH2.fix2.e2,Ax1.ei))) 


where el, e2, ands are the target language expressions for 
the ALGOL/60 statements El, E2, and S and the fixed point 
operator y° is 
AF. LET wl ,we='At, tat 
IN LET Ze((¥F #1) 12) 
IN (wl ASSIGN. HD 2); 


(w2 ASSIGN. TL Z); 
Zz 


146 


Extending the alternate notation for recursive definitions 
given earlier, an expression of type (1) will be alternately 
written 


LET REC H1,H2=)Xl,e1,\X2.e2 
IN s 


and further rewritten 


LET REC #1(X1),H2(X2)=#el,e2 
IN s 


More generally, if H1, H2, ... , Hk are declared variables, 
arrays, switches, or procedure identifiers whose target lan- 
guage translations are the expressions tl, t2, ... , tk, and s 
is the target language translation of the a statement, an 
expression of the form 
(2) (Aw. (481. (AH2... (Hews (Ist w)) (2nd n)) ... (kth w)) 

(v¥ .H1.AH2...AHKEk,...,t2,t]])) 


where 


(HD 1) 
anda = (HD (TL #)) 


kth 7 = (HD (TL (TL ... 4)...)) 
yx = AF, LET #1,72,.+.,7k='A','A’,...,°A' 
Im LET Ze(.,..((F wl) #2) 2... Wk) 
IN (wl ASSIGN. (HD Z)); 
(#2 ASSIGN, (HD (TL Z))); 


(nk ASSIGN. (HD (TL (TL .. 4)..))3 
z 


and 


if Hi, lsi<k, is &@ procedure definition of j variables 
XI,X2, ... , XJ 

then the expression ti is given as )X1.AX2...AXk.ei, where 
efi ia the target language translation of the procedure 
body, 


147 


will correctly define the (possibly simultaneous recursive) 
definitions in s. 

Further extending the alternate notation for k simul- 
taneous recursive definitions, an expression in the target 


language of form (2) will alternately be written 


LET REC H1,H2,...,Hk=t1,t2,...,tk 

IN os 
Furthermore, if Hi, ls<i<k, is a procedure definition of j 
variables X1,X2,...,Xj, then Hi and ti will be given as 
Hi(X1,X2,...,Xj) and ei, where ei is the target language 


translation of the procedure body, 


148 


For Statements: 


Consider the following ALGOL/60 for statement: 
(1) FOR X:=1, 2 STEP 2 UNTIL 7 DO X:=X+l 


Here, since the control variable is itself updated in the 
statement "X:=X+1", the statement "X:=X+1" is evaluated only 
three times, for the values of the control variable "X" equal 
to "1", "2" and "5", The critical point in this evaluation 
is that the increment for the control variable "X" is delayed 
until the statement following the "DO" is executed, possibly 


changing the current value of the control variable. Similarly, 


the evaluation of a for statement of the form 


(2) FOR X:=Q, U STEP V UNTIL W DO 5s; 


tow 


where "s" is some statement, can result in changing the 
values of "x", "u", "Vv", or "W" before each iteration of the 
statement. The delay in the evaluation of for list elements 
is handled through the use of dummy variable functions. For 
example, consider the following function definitions: 
REC STEP(A,B,C) = LET A',Bt,c' = (A 'aA'),(B rar), (CO "A') 
IN (Bt>O)A(C'<A') => ‘at 
(BY<o)A(A'<c') => ee taper as 
ai oe uae), Bo) 


149 


REC DELAY_CAT L = LET 4H,T = HD L, TL L 
IN LET H' = (H tA') 
In (fT # tat) => #! 
(Ht = tA) => (DELAY_CAT T) 
ELSE => f',7) 


REC FOR(V,L,S) = LET H,T = HD L, TL L 
IN (L = tAt} => tat 
ELSE => V <= H; (S 'A'); 
FOR(V, (DELAY_CAT T),S) 


and the following target language translation of the for state- 


ment (2) 
FOR(X,(DELAY_CAT fix.Q, 45. (STEP(An.U,An.V,An.W)) }, #) ® 


Here the function "DELAY_CAT", when applied to the list of 
dummy variable functions in a for list, produces (a) the null 
string or (b) the evaluation of the next element in the for 
list followed by the dummy variable Pad eeunus representing 
the remaining elements in the for list, The function "FOR" 
successively evaluates the statement within the for statement 
for each of the successively computed elements in the for list. 
The semantic constructs in ALGOL/60 are similar to those 
in many other computer languages for performing numertcal 
calculations, e.g., FORTRAN, MAD, AED an@ portions of PL/1. 
The semantic constructs in SNOBOL/1, defined in the previous 
chapter, appear im part in several languages for string 
manipulation, e.g., PANON/1B, TRAC and CONVERT. The charac- 


terization of certain important linguistic features, like 


*s' represents the target language translation of the source 
language statement s. 


150 


structures in PL/1 and AMBIT/G and real-time operations in 
PL/1, has not yet been attempted with the target language 
presented in this dissertation. I suspect that the delay 
feature in evaluating target language expressions will prove 
useful in defining real-time operations and that modifications 
to the target language will be needed to characterize conven- 
iently operations on structured data. Nevertheless, the 
characterization of SNOBOL/1 and ALGOL/60 have provided 
significant tests of the target language in defining semantics, 
and it is expected that future research will yield modifica- 
tions and extensions of the concepts presented here to define 
more varied computer languages. 

Since the discussion in this chapter has focused on a 
simplified exposition of certain eousteucks in ALGOL/60, the 
remainder of this chapter will be devoted to a detailed 
explanation of the complete formal definition of ALGOL/60, 


as given in Appendix }. 


Two Abbreviations for the Canonical Systems of ALGOL/60:* 


Besides the abbreviations introduced earlier, two abbre- 
viations have been added to the notation for canonical systems 
in writing the canonical systems for ALGOL/60. The first of 
these abbreviations allows the user to abbreviate construc-— 


tions defining an alternating sequence of two other 


*The remaining portions of this chapter are for those who wish 
to study in detail the formal definition of ALGOL/60 given in 
Appendix h, 


151 


' which con- 


constructions (for example, defining a "for list," 
sists of a sequence of for list elements each separated by a 
comma). Examples of the variants of this abbreviation are 
given in examples 7 in the table on the following page. The 
formal definition of this abbreviation is given in productions 
21 of Appendix 1.3. 

The second of these abbreviations generally allows the 
user to use a slash to abbreviate productions that are re- 
peated for each of the constructions defining real, integer, 
and boolean quantities in ALGOL/60. An example of the use 
of this abbreviation is given in example 8 in the table on 


the following page. The formal definition of this abbrevia- 


tion is given in productions 22 of Appendix 1.3. 


Notes on the Canonical System Defining the Syntax of ALGOL/60: 


Predicates Needed to Specify Context-Sensitive Requirements: 


To specify the context-sensitive requirements on the 
syntax of ALGOL/60, a number of additional predicates (S31 
through S41) are used. The predicate "TYPE" (S31.1) defines 
a set of three members, the strings "REAL", "INTEGER", and 
"BOOLEAN", The predicate "DIMM" defines a set consisting of 
strings of ones, where the number of ones in a string gives 
the dimension of an array. The predicate "SPEC" defines a 
set of strings, where each string specifies the use of some 


formal parameter in a procedure declaration. The predicate 


152 


est 


EXAMPLES OF ABBREVIATIONS USED IN THE CANONIC SYSTEMS OF ALGOL/60 


UBABR PRODS FOR LIST EL<e> FOR LIST<e>; 
FOR LIST EL<e>, FOR LIGT<t> POR LIST<2 ,@>; 


ABR PRODS FOR LIST EL<e> ~ YOR LIST<ALTBEQ(e ,)>; 


URABR PRODS PRIN<p> TERN<p>; 
PRIN<p>, MULT OP<m>, TERN<t> TERK<twp> ; 


ABR PRODS PRIN<p>, MULT OP<m> + TERN<ALTAEQ(p m)>; 


URABR PRODS YOR LIST EL<e..e'> + POR LIBT<e..0'>; 
POR LIBT RL<e..e'>, FOR LIGT«L.,£°> + FOR LIST<t,e..8,0'>5 


ABR PRODS FOR LIST EL<e..e!> + POR LIST<ALTSEQ(s ,)..ALTSEQ(s' ,)>; 
UNABR PRODS PRIMED. .p'> + TERM<p..p? 

PRIM<p..p'>, MULT OP<m>, TERN<t.,t*> + PERMCCap (at! apt) )>5 
ABR PRODE PRIM<p..p’>, MULT OP<m> + TERN<ALTBEQ(p m)..APPLIC(p' m)>3 
UNABR PRODS BOOL SEC<s,.0'> + BOOL FAC<m,.'>5 

BOOL BEC<s..8'>, BOOL FAC<f..f'> * BOOL PACerAS. « (ACE! 0!) )>5 
ABR PRODS BOOL SEC<s..8'> + BOOL FAC<ALTSEQ(s A)..APPLIC(s' @)>; 
UNABR PRODS REAL VAR:R VARB<i:1,>; 


SET VARGI VARS<i2k,>4 
BOOL VAR:B VARS<1i:t1,>,5 


ABR PRODS REAL/INT/BOOL VAR:R/I/B VARS<i:1,>3 


"SPEC LIST" defines a set where each member is a string of 
parameter specifications each separated by a comma. For 
example, if "P" is a declared procedure with two formal para- 
meters "X" and "A", ana "X" is used ‘as a real variable and 

"A" is used as an integer array of dimension three, the speci- 
fication list for the occurrence of the procedure declaration 
is "REAL,INTEGER ARRAY(111)". 

The predicate "SPEC1:SPEC2:COMB" (S33) defines a set of 
triples, where the first element is a parameter specification 
designating some use of a formal parameter, the second element 
is a parameter specification designating some other compatible 
use of the parameter, and the third element the parameter 
specification designating their combined use. For example, 
if the formal parameter "X" were used in three contexts, as 
a real variable in an arithmetic expression, as a real vari- 
able in a subscript list, and as a real variable that is 
assigned a value in an assignment statement, the following 


triples could be generated 
<A:REAL: REAL> <REAL: REAL: REAL> <REAL: ASGNED: REAL ASGNED> 


designating the combined use of "X" as a "REAL ASGNED" vari- 
able. Note that if X is used both as a real and a boolean 
variable, there is no way to combine the specifications "REAL" 
and "BOOLEAN" to obtain the specification of the combined use 
of "x", In the generation of legal programs, the use of this 


predicate prevents the generation of illegal procedure 


154 


declarations containing such incompatible uses of formal 
parameters, 

The predicate "SPEC MATCH" (S34) defines a set of ordered 
pairs, where the first element is the parameter specification 
of an actual parameter, and the second element is a compatible 
parameter specification of the corresponding formal parameter. 
The predicate "SPEC LIST MATCH" augments this set to include 
lists of parameter specifications. For example, if "P" is a 
_ procedure as defined above and "Q" is a declared integer 
array of dimension three, the function designators "P(3.1,Q)" 
and "P(TRUE,Q)" would have specification lists "ARITH EXP, 
INTEGER ARRAY(111)" and "BOOLEAN EXP, INTEGER ARRAY(111)". 

The specification list "REAL,INTEGER ARRAY(111)" would match 
the specification list “ARITH EXP,INTEGER ARRAY(111)" but 
would not match the specification list "BOOL EXP, INTEGER 
ARRAY(111)". Thus the use of this predicate prevents the 
use of incompatible formal and actual parameters. 

The predicate "USES:PARS WITH SPECS" (S35) defines a 
set of ordered pairs, where the first element of each pair 
contains several lists of formal parameters with each list fol- 
lowed by a parameter specification enclosed in parentheses* 


(e.g., "X,Y,Z,(REAL) A(111),B(1111),(BOOLEAN ARRAY))", ana 


*If the formal parameter is an array identifier, the identi- 
fier may be followed by the dimension of its subscript list; 
if the formal parameter is a procedure identifier, the 
identifier may be followed by the specification list for 
its actual parameters. 


155 


the second element contains the list of formal parameters 
with each formal parameter followed by its parameter specifi- 
cation (e.g., "X REAL,Y REAL,A BOOLEAN ARRAY(111),B BOOLEAN 
ARRAY(1111)"). The predicate "PARS:USES:SPECS" defines a 

set of triples, where the first element is a list of formal 
parameters (e.g., "X,Y,A,B"), the second element is a list 

of the uses of the parameters (e.g., "X REAL,Y REAL,A BOOLEAN 
ARRAY(111),B BOOLEAN ARRAY(1111)" ), and the third element 
the parameter specification list for the parameters (e.g, 
"REAL,REAL,BOOLEAN ARRAY(111),BOOLEAN ARRAY(1111)" ). This 
predicate is used to generate the specification list for the 
formal parameters in a procedure declaration. 

The predicate "ENTRY" (S36) defines the set of elements 
that can occur as auxiliary lists in the canonic system for 
ALGOL/60. An entry is either an identifier, or an array 
identifier followed by the dimension of the subscript list 
given with the array identifier, or a procedure identifier 
followed by the specification list of the actual parameters 
given with the procedure identifier. The predicates "DIFF 
CHAR", "DIFF stR", "DIFF ENTRY", "IN", "NOT IN", "NOT CONT", 
"DIFF ENTRY LIST", "DISJ ENTRY LIST", "L1:L2:INTERSEC" and 
"L1:L2:REL COMP" are similar to those given for SNOBOL/1. 

One important exception in the similarity for the ALGOL/60 
predicates and the SNOBOL/1 predicates occurs in the defini- 
tion of the predicate "IN" ($38.1). An entry is considered 


to be contained in a list of other entries only if the 


156 


dimension of an array identifier or the specification list 
other identical array identifiers or the specification lists 


of other identical procedure identifiers. 


Specification of the Context-Sensitive Requirements: 


In general, the context-senaitive requirements on the 
syntax of ALGOL/60 are specified by specifying a number of 
auxiliary lists with each syntactic unit and later specifying 
that each of these lists has certain properties. The lists 
or switch variables (S24 and S26.2), (») the identifiers 
used as real, integer, boolean, or switch variables (88.3, 
89.1 and 812.2), (c) the identifiers declared as real, integer, 
or boolean arrays (S25.9 and 825.10), (da) the identifiers 
used as real, integer, or boolean arrays (S8.4 and 89.3) 

(e) the identifiers declared as real, integer, boolean, or 
non-valued procedures (827.12) (f) the identifiers used as 
real, integer, boolean, and non-valued procedures (89.2, 89.9 
and 89.10) (g) the labels* ($20.2 and 821.3) and label refer- 


ences (S12.1), (h) the procedure identifiers and variables 


*Leading zeros in a numeric label do not effect the value of 
the label. For example, the strings "00149", "0149", and 
"149" each denote the label with value "149", Thus, a label 
is defined (S4) in the canongal system by a set of ordered 
pairs, where the first element is a label and the second 
element is its value. The auxiliary lists of labels and 
label references contain the values of each label string. 


157 


that are assigned a value in an assignment statement (818.1 
and 818.2), and (i) the variables used in the arithmetic 
expressions in an array declaration (825.1). 

The specification of the restrictions on each of these 
lists is complicated. The lists of formal parameters, para- 
meters called by value, and labels in a procedure declaration 
must contain identifiers each of which a different (predicate 
"DIFF ENTRY LIST" in S27.12). The lists of formal parameters 
used as real, integer, boolean and switch variables, the lists 
of formal parameters used as real, integer, and boolean arrays, 
the lists of formal parameters used as real, integer, boolean 
and non-valued procedures, the lists of formal parameters 
used to reference labels, and the lists of assigned procedure 
identifiers must each be disjoint (predicate "DISJ ENTRY 
LISTS" in 627.12). The lists of declared identifiers and 
labels in a block must each contain different identifiers 
(predicate "DIFF ENTRY LIST" in 829). The lists of identi- 
fiers used as variables, arrays, procedures, and labels must 
each be disjoint (predicate "DISJ ENTRY LISTS" in S29). 

The lists of identifiers used in a procedure declaration 
but not specified as formal parameters (the primed variables 
in 827.12), the lists of identifiers used in a block but not 
declared in the block (the double primed variables in $29), 
and the lists of identifiers used in the bound pair list of 
an array declaration (the variables with a subscript "m" in 


S29) must be obtained and specified as used identifiers in 


158 


the procedure declaration or block. Furthermore, with each 
declaration (825.4) or use (88.4 and 89.3) of an array identi- 
fier, the dimension m of the associated bound pair list or 
subscript list is kept with the identifier in the auxiliary 
lists of declared and used arrays. Similarly, with each 
procedure declaration (827.12) and function designator (89.2, 
89.9 and 89.10), the specification list x of the formal or 
actual parameters is kept with the identifier in the auxiliary 
lists of declared and used procedures, The specification list 
for a procedure declaration is obtained through the predicate 
"PARS: USES:SPECS" discussed earlier. The restrictions that 
the dimension of each use of an array identifier must match 
its declared dimension and that the actual and formal para- 
meter lists must be compatible are specified through the 
predicates "“PARS:SPECS:USES", "L1:L2:REL COMP" and "L1:L2 
:INTERSEC" as discussed earlier. 

Finally, a string is defined as a syntactically legal 
program only if the lists of used but not declared variables, 
arrays, procedures, labels, label references, and assigned 


procedure identifiers are each given as null (S30.3). 


Notes on the Canonical System Specifying the Translation 
of ALGOL/60 


Three additional predicates (T42) are used in the specifi- 
cation of the translation of ALGOL/60 into the target language. 


The predicates "LIST:CORR NULL LIST", "LIST:CORR UNSHARE LIST", 


159 


and "LIST:CORR INDEXED LIST" define sets of ordered pairs 
where the first element of each pair is a list of identifiers 
(e.g., "X,Y,Z,") and the second element of each pair is 
respectively (a) the corresponding list of null strings (e.g., 
"TATSTATLSTA'L"™)*# (bd) the corresponding list of expressions 
applying the function "UNSHARE" to each identifier Ceigin. 
"(UNSHARE (X 'A')),(UNSHARE (Y 'A't)),(UNSHARE (2 (Y 'A'),", 
and (c) the corresponding list of identifiers each followed 


by a "#" and a positive integer (e.g., "X#1,Y#1,2Z#1,"). 


*In the target language these lists are used in expressions 
like "LET X,Y,Z2, = "At ,tAt,'A', IN ...". Strictly speak- 
ing, the last comma in each list should be removed. 


160 


CHAPTER VI 


DISCUSSION 


This thesis describes a formal system for defining the 
rules for writing programs in a computer language and for 
defining what these programs mean. The author strove for 
simplicity of the formal system, and then applied the formal 
system to define two complete computer languages, ALGOL/60_ 
and SNOBOL/1. 

Besides simplicity, such attendant qualities like 
naturalness, perspicuity, and communicativeness have been 
accorded due allowance. Necessarily, I have used my personal 
discretion in weighing these qualities. It is inevitable 
that further research will refine the optimal balance of 
these qualities. Admittedly, there exists no known metrics 
for measuring these qualities precisely. They are subject 
to a latitude of interpretations. This fact should not be 
Burprising. Indeed, almost every computer language has at 
least the theoretical capability of defining any computable 
algorithm. Why so many computer languages? It is more 
natural or more concise to define an algorithm in one lan- 
guage than another 

Canonical systems were used here to define the syntacti- 
cally legal strings in a@ computer language and the transla- 
tion of the legal strings into strings in some other language. 


Not once was it necessary to step outside the formalism to 


161 


define the syntax or translation of a language. Although 
some complexity was added to the formalism by introducing 
abbreviations to the basie notation, even the abbreviations 
were ultimately defined in terms of the basic formalism. 
Extended Markoy algorithms and the A-calculus 
were used as a basis for defining semantics. Prior to this 
effort, work has been done by others in using formalisms 
like recursive function theory, Markov algorithms, formal 
graph theory, and the A-calculus to characterize computational 
processes, However, the marriage of extended Markov. algo- 
rithms to the A-calculus is to my knowledge the first attempt 
where two formalisms have been intimately combined to charac- 
terize computational processes. Almost every construction 
in SNOBOL/1 and ALGOL/60 was solely within the combined 
formalism. The introduction of new expressions to the 
combined formalism to mirror the assignment and transfer of 
control constructions in SNOBOL/1 and ALGOL/60 appeared un- 
avoidable. Nevertheless, these additions accomplished com- 
plete definitions of the semantics of both languages. More- 
over, the entire target language was eventually defined by 
an extended Markov algorithm defining a machine for evaluating 
strings in the target language. 
The extended Markov algorithm definition of the target 
language evaluator not only reduced the definitions of 
semantics to a single formalism, but also demonstrated that 


a& computer possessing only the characteristics needed to 


162 


evaluate an extended Markov algorithm is sufficient to 
execute source language programs translated into the target 
lenguage. The conventional machine facilities existing in 
most computers, like those for performing arithmetic and 
logical operations and those for transferring control within 
@ program, are not needed to evaluate target language pro- 
grams, although they may be convenient. On the other hand, 
such horribly detailed machine facilities, like those for 
shifting bits or branching on the setting of a mask, appear 
to be useless in evaluating target language programs. The 
ability to use extended Markov algorithms as the basic 
evaluating mechanism for computational processes suggests thet 
machine languages quite different. fron thosé convestionsally 
used might be more effective for defining computational 
processes. However, this subject is, at least, worth assther 
doctoral dissertation, 

One may well ask; Why was one posal i se; canonical 
syetems, used to define the syntax and trangiation of # lan- 
guage? Why wae another pair of formalisms, extended Markov 
aigocithee sae vee x~«dalctiius, used to define the semantics 
of.a languaget Amd why was just: extended Markov algorithms 
used to define the target languége é€valustor? The following 
are By ayewers. First, it appears conventent to define the 
syntex ap€ translation of a language with a geseraftive grammar 


(which canonical systems provide) thet frees the language 


163: 


designer from the details of specifying a scanning algorithm 
for determining whether a source language string is accept- 
able. Second, a computer language generally specifies some 
well-defined algorithm for performing a computation, and 
hence it seems somewhat natural to define the semantics of 
a@ computer language with some simpler algorithmic formalisms 
(like extended Markov algorithms and the A-calculus). 
Third, extended Markov algorithms alone were sufficient to 
define the target language evaluator. Fourth, the considera- 
tions of naturainess and perspecuity arise again. The 
formalism of canonical systems seemed well-suited to define 
the syntax and translation of a language, the combined forma- 
lism of extended Markov algorithms and the  A-=calculus 
readily lent themselves to defining what a language means, 
and extended Markov algorithms provided the desired concise 
definition for the target language evaluator. In short, 
different formalisms model different processes with different 
degrees of complexity. 

I have attempted to separate the specification of the syntax 
and semantics of a language into three parts: (1) the specification 
of the legal strings in a language, (2) the specification of the transla- 
tion of the legal strings into the target language, and (3) the specifica- 
tion of the primitive functions used in the target language. Although 
each of these specifications must depend on the others for their cor- 


rectness, the specification of the primitive functions in the target 


164 


language were written for the most part after the specification of the 
translation of the source language into the target language and re- 
sulted in few changes to the definition of translation. On the other 
hand, it is unfortunate that the specifications of the syntax and transla- 
tion depended heavily on each other. A change in the specification of 
the syntax often required a change in the specification of the transla- 
tion, and vice versa. It would certainly be valuable to develop a con- 
vention that would better isolate the specification of the syntax and 
translation. 

Although the semantics of a source language was formally 
defined here by the target language, and although canonical 
systems specify only the syntax of a language, a large portion 
of the semantics of the source language was somewhat impercep- 
tively defined in the canonical system defining only the syntax. 
of the language. By using descriptive predicate names like 
“ARITH EXP", "COND STM", and "LABEL", a correspondence with 
the English language was made to aid the reader's understand- 
ing of what was being talked about, i.e., the semantics of 
the constructions being defined. A similar use of the 
English language occurs in a Backus-Naur form specification 
of a computer language. The use of metalinquistic variables, 
like "ARITH EXP", "DIGIT", and "PRIMARY" in productions like 
"<ARITH EXP> :: = <DIGIT> | <PRIMARY>", does convey some idea 
of what the specified strings mean, although strictly speaking 


the productions define only certain legal strings in a 


165 


Fae ea ar EO Me ee SpE, ee ay eg eh gee, Ep ee ee Eg ee eRe COM ee Se ee ae et dk Ot VARS eee SR eae” 2b 


language. In this way both canonical systems and Backus-Naur 
form make good uses of one of the most popular meta-languages, 
the English language. 

There are several immediate uses of the formal system 
presented here. First, when developing a language, it would 
be desirable to have a formal definition specifying precisely 
what strings are allowed in the language and what the strings 
mean. Such a formal definition could be given to others for 
their analysis and would sharpen the debate over whether the 
convenience of each construction in the language would be 
worth the difficuity in é¢xplaining or implementing the con- 
struction. Second, after the designers agreed upon the con- 
structions in the language, the formal definition would be 
valdabdle to those implementing the language or those prepar- 
ing the fanguage manuals in that théy would know unambiguously 
What was intended by the language designer. 

fhe formal system presented here operis several avénues 
for future research. As previously mentioned, since canonical 
systems can define precisely both the syntax and translation 
of & language, canonical systems might be used as the basis 
for attomatic trenslatien between computer latguages. If an 
efficfent algorithm eould be developed to recognize strings 
specified by a canonical system and generate their translation, 
@® canonical system definition of s Ianguage could be imme- 
diately used to translate legal prograws in the Yenguage into 


atdother language. Another use of the formal system might be 


166 


in the implementation of "extensible" computer languages. 

By simply adding or changing the productions defining the 
syntax and semantics of a language, the new productions could 
be given to the algorithm for translating strings specified 
by a canonical system, thereby implementing the extended 
language. 

The author has attempted to integrate and adapt three 
known formalisms to define computer languages. These formalisms 
have been blended into a formal system for defining computer 
languages rigorously and somewhat concisely. The most signifi- 
cant portions of the attempt here are the application of 
canonical systems, the marriage of extended Markov algorithms 
with the A-calculus, and the application of extended 
Markov algorithms to define an evaluator for the target lan- 
guage. It is hoped that this work is a progressive step in 
achieving the thesis of this dissertation, to meet the need 


for formal methods for completely defining computer languages. 


167 


(ea) 


Funiey wWruwnre rw 


w rrr www RNY ee 


(») 


~ 
. 


eevee . 


RE Vrwne 


we 


An vw rr veEWVUw 


ee 
or 


Appendix 1.1 CANONICAL SYSTEN SPECTFYING PEE _SYRTAX OF A SUBSET OF ALGOL/69 


Basic notation only 


DIGI? 


VaR 


PRIMARY. 


ARITE 
stm 


Exp 


TYPE LIST 


DIGIT<1>; 
BIGIT<2>; 
DIGIT<3>; 
VAR<A>; 
VAR<B? ; 


DIGIT<a> + PRIMARY: VARB<da:A>; 

VAR<yv> > PRIMARY: VARS<¥:7,>3 : 

PRIMARY : VARS<p:v> > ARITE EXP: VARS<p:yv>; 
PRIMARY: VARS<p:v>, ARITH EXP: VARS<a:u> + ARITH EXP: VARS<etp:uv>; 
ARITH EXP: VARS<aru>, VAR<v> + SPM: VARS<v:ea : v,u>3 


TYPR LIST<A>; 

TYPE LIST<B>; 

2TPZ LISt<A,B>; 

TYPE LIST<t> + DEC:DEC VARS<INTEGER t:2,>; 


STM: VARS<e:u>, DEC:DEC VARS<d:v>, IN<urv> + PROGRAM<BEOIE €;s &BD>; 


IN<A,tA,>3 

IN<B,:B,>3 

IN<A,:A,B,>3 

Im<B,:A,B,>3 

IN<ust>, IN<y:t> +» INe<xy:2>; 


with abbreviations 


DIGIT 
VaR 


PRIMARY 


ARITE 
Stn 


TYPE LIST 


DEC 


PROGRAN 


m 


DIGIT<1> ,<2>,<3>; 
VAR<A>,<B>; 


DIGIR<a>; + PRINART<é>; 

VaR<v> + PRIMARY: VARS<viv,>; 

PRIMARY<p> + ARITE RXP<p>; 
PRIMARY<p>, ARITH BXP<u> + ARITE BXP<atp>; 
ANTTH BAP<a>, VAR<v> + STMIVARS<w := arv,>; 


TYPE LIST<A> ,<B> ,<A,B>; 
LYPE LIST<2> + DECrDEC VARS<INTEGER 2:1,>; 


BTN: VARS<e:u>, DEC:DEC VARS<é:v>, IN<urv> + PROGRAN<BEGIE 4; s END> 


IN<A,2A,>, <A,:A,B,>, <B,:B,>, <B,3A,B,>; 
T<zrh>,<yit> + I<xyrd>; 


168 


Appendix 1.2 CABOUICAL SUOTEN SPECIPIING SBE rranebarion oF 
TEE GvbeRT x97 sgememaee Lanewass® 


(oe) Beste netation caly 
BYOTT<1>5 


> PRIMART: VARG<d, oF 'S' 2 A>; 
> PRIMARY: VARG<v..viv,>3 2 
PRIMARY: VARG<p..p'tv> + ARITH REP: VARS<p,. L l,pt @L0ad pre; 
PRIMART: VARSsD..p?iv>, ASTLE EPs VARS<c, .a'ru> 
+ ARITE EXP: VARS<aep..0'd A A,p* PADD prev; 
ARITE EXP: VARG<a..0°tu>, VAR<Y> 
> STU: VARG< 12 o..0°3 SP 1,v “SPORE RESULT IB viv,u>s 


eee 


w vuuw fo 08 pe 
uw FUN Hee 


e 


TTPS LIST<A,B..A 
TIPE LIST<s..A°> + BSCiDSC VARO<IMPESER £..2'38,>5 


OTH: VARE<s,.0's>, BECDBC VARS<G..4'sv>, JE<usv> : 
Peoehan< 4; s E=D..*s00RRLOR 
wszee ©,15 *ruvORN ABGEBLERIc¢ 
VOR VARIABLES 43 ame, 


rrr 
rawr 


Tiey:a> + YNenyetrs 


(v). with abvreviations 
> PRIMARY <é. .oP*G' >; 
ASTSR EEP<a,.0'> 
VAR<v> 
IB wr; 


STPR LIST<A..A BS P>,<B..B RS PF, <ck,B.A BS PUD BS Py 
SEPE LIOR<2..2°'> <« BOCCIVERORR 2..4°>5 


CPice..0°>, BOEG..0'> + PROGRANCDEGIN 4; 0 ‘BED. /oARSDOLES 
15.0 SORT ness averevend .. weiee. 
e158 srepees wewmLEne © fad enerees. £0 supanrrsen. 


fhe symbol “3° denotes o new line. 


169 


eee oeees 
AVFrWNhH ww ANH MEW e 


033 ALPHA<0>,<1>, PITT Ts D? LY til bled bt tet MS bed | 
PRED CHAN<O> ,<1>, 2.69 ,CA>, <B>, 00. tES 
PRED CRAR<e> > PRED PAaRT<a>; é $ 
PRED CHAR<a>, PRED PART<p> + PRED PART<pa>; : 
+ PRED ALPHA<p>; 
» UP PRED<q> + PRED ALPRA<p* :°q>; 


VaR ALPHA<“a%>,<°B°>, 2.554 °O§ 
SUR OR SUPERSCRIPT<,>, <)>, covet ers <_s % 


VAR ALPHAcy>, SUB OR SUPERSCAIPT<y> + YAR ALPHA<ve>; 
PERN<A>; 


>» Pe Sie | 


~o WE TERN<a>: 
> WP TERN<e>; 
+ WF TERMtr>; 
+ WF TERM TUPLE<<t>>, 
vw TERK TUPLE<<r>> © WF PERM TUPLE<<r* : “t>>; 


ED ALPHA<p>, WF TERM TUPLE<t> «+ WF ATOM PORN<pt>; 


ATON PROD<p> + WP PRENIBE<p>; 
ATOM FROD<p> + WP CONCLUBION<p>; 


COBCLUSIOR<¢> WP ATON PROD<c >; 
ATOM PROD<p> VF PROD< p>; 
PREMNISE<p>, WF CONCLUBION<c> WP PROD<pec;>; 

F PROD<e-c;>, WF PRENISE<p> WP PROD<2 ,pre;>s 


PROD<p> > WP CaRONICAL SISTEN<p>, | ! 
PROD<p>, WF CANOBICAL SYSTRM<c> + WP CANONICAL SXSTEN< p>; 


' 


Productions defining the rules for deriving stringe specified by © esnonicel systen 


PREWS :DERIV 


ONICAL SYSTEM STA<a>, WF CANONICAL SYSTEN<e> + CANONICAL SYSTEN<e>; 
ONICAL SYSTEM<ept>, WP PROD<p> + PROD«<p>; 


ey { 
ROD<p>, SUBST<pivierq>, STR WITH BO VARS<q> + PROD INSTAECE<e; 


ERIVATION<A>3 : | 

DERIVATION<é>, PROD Tesranca<e;>, . WP CONCLUBION<c> a 
+ BERIVATION<2 o> 

DERIVATION<4>, prod IEBTANCE<p+c,>, PREMD:RBRIV CONT. PRBBBep a> 
° Denivasreded. ¢ @>s 


OBJ ALPRA<a> + OBS SER<g>; 
OBS ALPHA<a>, OBJ STR<s> + OBS BIN<sa>; 
WITE BO vanac,>, ibe he Oe oe tet Oe Se eS | 
eY RLPTA<a> oom “wits 80 VARO<a>; 
RED ALPUA+a> : 2 STR WITH 80 YaRace>, 
Witw' Wo VaRs<ea>,<t> + STR VITE ees. 
‘AR CHARI DIFF vaR CHAR<a: b> 2eser>, 65, eure, < of >,* ota”? ates 
<" sung 
‘AR ALPEAcecs>, <ady>, vaR CHAR: DIFF VAR CHAR<c:4@> 
> VAR: DIFF VAR<sext sdy>; 


AR ALPHA<w>, OBJ STR<a> SUBST<viszv:8>; 
VAR ALPHA<y>, OBJ BTR<a>, STR WITH BO VARS<t> SUBST< er 8301); 
AR ALPEA<w>, OBJ STR<e>, VAR:DIFF VAR<viw> SUBET<viasvry>; 
BUBST<vissusy>,<visiz'sy'> SUBST<vierzx! syy'>; 


P paarty co PRENS< pr p>; 
PEEUR i DEY CONT PuEmsceie o>, 


AON peso vOReD> , PREM DERIV CONT PREMS<01t> 
<e pit Pp? > 


170 


(e) Productions defining the rules for converting an abbrevisted seasonal systen inte usaddrevicted 
fora. (The folloving mnemonics are used here: PF ® Prodyetion, AP = Atomic Preduction, 
C8 = Canosdenl Byaten.) : 


WP PROD<p~0;> 

WP PROD<p+c3>, ABR1 PrP<pos;:t;> 
WP ATOM PROD<e4> 

WP ATOM PROD<e,>, ABRL APL AP<8;:t3> 
ABA] CBrCB<Ar A>; 

ABR1 CB:CB<e:d>, ABR] F:P<p:q> + ABR1 CS:C8<eprdqrs 
ABR1 CB:CB<erd>, ABR] APLAP<prq> + ABRL C8:C8<eprdq>; 


ABR1 PrP<presipees>s 
ABRL PrP<pre 93 tprezts>; 
ABR) AP1AP<e,2¢ 53>; 

ABR) AP: AP<s,e; :t303>3 


eee 


(coll od ll exlndlendaad 
FUR KH Vr NM AAW RU AU re 


VR 


BR2 CB:CB [C8 DELINITER< ,>; 

C8 DELINITER< +>; 

C8 DELIMITER< >; 

WP ATOR PROD<p<t>> > ABR2 STR: STR<p<t>rp<t>>; 

WP ATOM PROD<p<t>>, ABR2 STRiSTR<p<cyr>is> + ABR2 STRiSTR<p<r>,<t> PH ypctrrs 
ABR2 CB:CB<AtA>s 

ABR2 CBrCB<e:d>, ABR2 STR:BTA<a:t>, CS DELINITER<m> + ABR2 CB:C8<cen:éta>; 


an 


_ PAARHK 


WF ATOM PROD< p> ,<e> + ABR} PsPcpoe;yrpoas>s 

WP ATOM PROD<p>, ABR3 PrPcseczt> + ABR3 P:Pcplerespeoest>y 
ABRS CBsC8<AzA>; 

ABR3 CB:C8<crd>, ABR3 PrP<pry> + ADRS CB:C8<cprdq>; 
ABR3 C8:CB<c:a>, WP PROD<p>, NOT IR<]:p> «+ ABR3 CSrCB<eprdq>; 


hela al oti edad slick adel od 


PRED PART<p> ,<q>, VAR ALPYAca>,<b>, DIFF STA<ard> 
+ ABRL P:P<p<ersq<SEQ(a)>; 2: pcersqcar; pear, q<b>oqcad> 5>5 
BRA CE:CB<AtA>; 
BR’ CB:CB<crd>, ABR PrP<p:y> > ABRh CB:CB<ep:dq>; 
ABRA CS:CB<c:4>, WR PROD<p>, BOT COMT<SEQ(sp> + ABRE CE:CB<eprdqr; 


22 @ NAAN 


we Me 
ms 


19. 


PRED PART<x>,<p>, WF TERM<t >, aux PRED: TERMS<p, :t,>.<Poity>s 
STR<a> + ABRS PrPceonp post t tors ¢ oop, >! Ppyst ty” 1 “Atos 
PRED PART<x>,<y>, wr FERN<t > ,<t>5 AUX PRED: TERMS<pit>,<q,:7r.>, 

<Qgi hor s<Py It) >,<poity> STR<g>,<s">, VAR ALPRAcv>, NOT COST 

‘ ' 
<Vise ttyh Paty te” > ABAS PrP<e FQ, 42<t ry rome EP PPost ttt ors 
=3° ’ 

18 ya, Pa,<t Fr) t vre>8 EP, PPo<t ti tvts>soy 
ABRS P:P<piq>,<qir> + ABR3 P:P<pir>; 
ABRS CB:CB<ArA>; 
8 


PRED PART<p>, WF TERV<e>,<t> + AP SYB:AP PRI COMB< peed :p<t>sp<t>>; 
AP SYR: AP TR:COMB<p<m>:q<t>:r<u>>, AUX PRED: TERNB< a: m> 
+ AP SYN:AP TR: CONB<pd< am? tQ@<tm trd<mm>>, : 
AP SYS:AP TR: COMB<cpce>tqct>:rcu>>, AUX PRED: TERMS<drm>, BOT CONT<a:p> 
+ AP BYNIAP TR: CONB<pd<om>iq<t> trdcum>>5 
AP STHIAP PR¢COMB<p<e>zq<t>ir<u>>, AUE PRED: TERNS<d:m>, BOT CONT<d:p> 
+ AP BYR:AP TR:COMB<p<s>:qd<tm> i rdcum>>; 
ABRG CBsCS<AzA>; 
ABRE CB:CS<c:a>, WF ATON PROD<p>, CS DELINITER<m> + ABRE CS:CB<epa1épa>; 


BR6 CB:C8<e18>, AP SYN: AP TR:COMB<e:t:b>, CB DRLINITER<m> + ABRE CO1CS<ee//tut dba; 


PRED PART<p>,<q>,<r>, VAR ALPRAcu>,<¢r>,<w>,<ul>,<w'>,cwl>, OBJ 
ALPHA<s> DIFF STR<v:u>,<wiv>,<oru'>,<wiv'>,<wrw’>,<wi tur cwiiv>, 
<wiiut>ycwlivi> + ABRT PrP<pcu> + qcALTSEQ(u 6)>; 2 pcur+e<u>sp<ur, 
Qtv> + q<vsu>s>,<pcu>,rewr>eq<ALTOSG(u v)>; 1 pcwr+qcur>spcu>, r<w>, 
Q<¥>oq<vwur4>,¢p<u..u'>*Q<ALTSEQ(u o).. ALTSEQ(u' 8)>; s<p<u..ut> 
746. oUl>EPSu, .Ut> odS¥.. "> Qt teu. . T BU? s>,<pcu..U >_< ALTSRG(u s) 
« APPLIC(u’ 8)>3 2 p<u..ur>rqcu..ul> Mehr sg< v 

ae (Lor")ut)>y>,<psu..u'>,rgw, we >oqcAL TORQ le w). APPLIC 

d>4 01 psu. o'>eqcu..u>spcu. a> srw. wl? ecw. yh> 
*qcvuu.. (Cw? wt jut )>sr; 

ABRT CS:CB<A:A>;y 

ABRT CB:C8<crd>, ABRT P:P<prq> + ABRT CB:C8<cpidq>s 

ABRT C8:C3<crd>, WP PROD<p>, NOT CONT<ALTSEQ:p> + ABR? CB:CB<ep:dp>; 


BRO PrPL:REST<As As A>; 

BRO P:P):REST<p:q:r>, STR<s>, KO? CONT</ 19> 

BRS PrPL:RBST<p:qir>, IDSTR<i>, STR<i/s>, BOT COMT<[re> 
ABRG P:P1:REST<psqir> " 

BRS P:PS:REST<prqir>, CONT</:r>, ABR P:PS:REST<r:q'ier'> 
ABRO P:PS:REST<p:qrr>, WOT COBT</:r> 

BRE CH:CB<As A>; 

aBRS C8:C8<c:d>, WF PROD<p>, MOT CONT</:p> > ABAD. CS: CB<eprdp>; 
BRS CH:CB<c:4>, WF PROD<p>, CONT</:p>, ABRT P:P<piqg> + ABRO CO: CB<epidq>; 


ABRS P: Pl: REST<paraqstre>; 
ABRS Pi Pl: REST<pi/arqitre>s 
ABAG Pr PScREST<prqirrs 
ASRS Pr PSr:REST<psqqtir’>s 
aBRO PL P<piqr> 


eeeees 


ASM CANQHICAL SYSTEM STR<a>, ABRG CS:C8<a:b>, ABRS CB:CS<bic>, 
ABRT CB:CS<cia>, ABRS CO:CB<dre>, ABRI CBrCR<e:f>, ABRS C8:CB<fig>, 
ABR2 CB:CS<grh>, ABR1 CB:C8<h:i> + CANONICAL SISTEM BYR<i>; 


H 
{ 


ICMARKA> <B> yon gS Zr gs OPM SOAR y ce gS BM COM CLD, Lag KG ye eM Qk Ory cong SOPZE 
BTR<A>; 
STR<s>, CHAR<e> + STR<sc>; 
DIFF CHAR<A:BY,<A:C>, we. cots 
STR<axe>,<ayt>, DIP? CHAR-x:y> + DIFF STR<axs>,<ayt>; 
CHAR<e>, BSTR<set> + CONT<crsct>; 
STR<e> + NOT CONT<e:A> 3 
WOT CONT<sx:t>, DIFF CHAR<aty> +» KOT CONT<ex:ty>; 
NOT CONT<exzarty>, DIPF CHAR<x:y> + BOT CONT<suastye>; 
C8 DELINITER<,> ,<+>,<¢>; 
AUZ PRED: TERNS<Ar A>; 
AUX PRED: TERNS<p:t>, PRED PART<q>, WF TERN<r> + AUX PRED: TERNG<p* :qit*1°r>3 
PREDICATES MATCH< A> 
PREDICATES MATCH<a>, WF ATOM PROD<p<t>>, CS DELINITER<m>, 
CB PREDICATES<q>, CONT<p,:q> + PREDICATES WATCE<ep<t>m> 


171 


Appendix 1.4 DERIVATION OF A LEGAL PROORAM AUD 
. TOs 


Rule 1: DERIVATION<A>; 
Rule 2: DERIVATION<d>, PROD INSTANCE<c;>, WF CORCLUSION<e> > BERIVALTION<a «>; 
Rule 3: DERIVATION<a>, PROD INSTANCE<p+c;>, PREMS:DERIV CONT PREMB<p:@> + DERIVARTON<a@ o>; 


(a) Derivation of © syntactically legal program 
Production 


from Conclusion added to éerivation 
App. l.le 


DIGIT<1> 

VAR<A> 
PRIMARY : VARS<2 3 A> 
ARITH EXP: VARS<1;A> 
STM: VARS<A:@] 1: A,> 


TYPE LIST<A> 


DEC:DEC VARS<INTEGER A:A,> 
TE<A,2:A,> 


| PROGRAM<BEGIN INTEGER A; Are] END> 


(bo) Derivation of e syntactically legal progrem and its transletion Lato 
assembler language. 


Production : 
fron Conclusion ad@eé to derivation 


| App. 2.10 
DIorT<1> 
VAR<A> 
PRIMARY: VARS<1, .0P'1" 1 A> 
ARITH EXP: VARS<1,, L 1,9F'1' LOAD 1:A> 
STM:VARS<Arel,, L 1,9F'2" SLOAD 1 
BT 1,A STORE RESULT IB A:A,> 


TYPE LIST<A,..4 DE F> 


DEC: DEC VARS<INTEGER A..A DS F:A,> 


IN<A,:A,.> 


PROGRAN« BEGIN INTROEA A; As=1 END.. 
CASSENDLER LANGUAGE PROGRAM 
BALR 15,0 ®BET BASE REGISTER 
usIsc *,15 SXEPORM ASSEMBLER 
L 1,07'1' LOAD 1 
st _1,A ©aTORE RESULT IS A 
syc Qo “RETURN TO SUPERVISOR 
@BTORAGE POR VARIABLES 
a 0S r 
BED> 


172 


Appendiz 2.1 CANONIC SYST ECIFYI: 
TEE ALOOL/60 SUBSET I9TO THE TARGET LABQUAGE 


PRINARY DIGItT<a> + PRIMART<@, .a'>; 

VAR<v> + PRIMARY<v..v>; 
PRINARY<p..p*> © ARITH EXP<p..p'>; 
PRIMARY<p..p'>, ARITH EXP<e..a'>+ ARITH EXP<atp: (+(a'.p’))>; 


ARITH EXP<a..a'>, VAR<v> + STN<vr0a.. (vy ASSIGH. a')>; 


ARITE EXP 


STR 


TYPE LIST | TYPE LIST<A,. A>, <BLo'A'D,<A,Bie A’, ADS 


TYPE LIST<t..2°> DEC<INTRGER £..2028'>; 


STN<s..5'>, DEC<d..4"> + PROGRAN<BEGIN 439 ZED..LET @' IN 8'>; 


Appendix 2.2 DEFINITION PR gt 
Set definitions f tring variable | r,s « str | 


CHAR 


DIGIMWO>,<1>, ... .<93 

LETTER <A>, <B>, ... | <E>3 

WARK<,> o>, a 2 *O>; 

DIGIT @ > | LETTER<p> | MARK<p> + CEAR<p>; 


BTR = STR< Ay 


STR<a>, CHAR<c> + STR<se >; 


Definition of primitive functions 
CAT a 


EQ{a,8) 


comD(*,a,6) 


succ a 


PRED @ 


REC 4(x,Y) 


imitive actions 
[: Pred © Petal be be ) . 
w/a. ++ TRUE 
[vs a/. +. PALSE ] an 
Es 8 ] - 
PALSE ~+- 6 
/s0/r. ++ sir 
/al/r. e+ s2r 
: /e/ 
/e8/r. «+ s9r 
/a9/r. + = /e/Or 
‘fre. +e Ar 
/0/. +2 0 
teeter: Jah 
s0/r. + 9/9r 
/sl/r. «+ e0Or fel 
/s2/r. 7° ale 
/a9/r. d. str 
EQ(Y,°0') DX 
ELSE =} sUN(SUCC X,. PRED Y) 


173 


Appendix 2.3 DEFINITION OF AN BY. 


(a) Set definitions for string veriables: ! Ppt" 858" oy Fy see + og oFs ester | 
£,¥, VARIABLE | p,p' ¢ PrR | i,j,k ¢ INDEX | h,h' c¢ EXP ED | t,t" ¢ EXP TL | 
bh, ¢ SEQ ED | t, © 8kQ TL | asat © LABZL stR |: 


BxP TL 
SEQ ED, 63Q TL 


DIGIT<O>,<1>, 20. -9<9P5 

LETTER<A> , <B> <3> <a>, <P> <u>; 

MARK<$> ,<4>, ot )>5 

DIOIT<p> | LETTER<p> | MARK<p> + CHAR<p>; 


CHAR<c> + 8TR CHAR<a> ,<A>,<.>,<(>,<)>, f}>,<*2°>5 
STR<A>; 
STR<s>, STR CHAR<c> + STR CHAR<se>; 


STR<s> > CONSTART<*s'>; 
CHAR<e> + YARIABLE<SEQ(c)>; 


PIR<l>; 
PIR<p> > PTIR<1p>; 
DIGIT<a> + INDEX<SBQ(d)>; 


LABEL STR< A>; 
LABEL STR<e>, VARIABLE<2> +> LABEL STR<st'; 


CONSTART<p> | YVARIABLE<p> EXP<p>; 

EXP<e>,<f>, IJEDEX<i> exp<(,¢ £)>3 
VARIABLE<v>, EXP<e>, INDEX<i> EXP<Ayv.0>3 
VARIABLE<v>, EZXP<e>, INDEX<i> EXP<(;¥ ABSIGH. ©)>; 
SEQ<s> ; BIP<s>, 

EIP<e>, IBDEX<i> ExP<( GOTO. ©)>; 


IBDBX<i>,<Jj>,<k> - <2. 6-2 30.( 6 a)>3 
EXP<e>, “T<t>, IBDEX<i>,<j>,<k> +» matt (Je §) aveeed>s 
SEQ<e>, VARIADLE<t> o angers), 

BEQ<e>, Tct>, EXP<e>, IEDEX<1>,<J>,<k> + saa<(,(,% @) Aysa)>5 


CONSTAET<c>, VARIABLE<y>, INDBX«i> + EXP BD<c>,<v>,<(,>,<A,>, 
<v assran.>,<doro.$; 

RXP<bt>, SXP ED<h> +> EXP TL<t>; 

VARIABLE<2>, EXP<e>, SEQ<h Creh,> > BBQ ED<h>, 8BQ TLch,>3 


174 


(bo) Substitution rules 


*Contro) 
*Result 
(lel e021) Environment 
a1, > *store 
ht Eapression 


Tait 
Str 


vo ly 


Evaluate! _ 
Variable 


ts iF 


5.3 


~[Eveluate | 
Lave) Ref 


Evaluate 


Constant | (2P +7)! 


(pn) 


APPLY. 


GOTO. APPLY. tly a ly 


Pls}. fly 


Apply 
(poney) Goto. tt 


Gat niet) 


v 
A 
(j%+t vep) 


et 

A 

Joon 

(pordalpt sage) 
’ 

Ayvebsatra ht t, 


ASSIGN. APPLY. 
> R 


Cperdelp'ar') 


APPLY. 
ppt 


Cperdetp tr") 


h* h APPLY 


Apply 


Aneign| (Per')a(pir?) 


I 


Exit 
rom Asexp 


175 


(p, 


ASSIGN. APPLY. 
pet 


(pt r')elp.r) 


(p*.r')elp.r) 


(pyr) 


s) Byeiuate 


=p 


lo, try: 


aly 
dp 
Cp ayes dr 


jet r 1%er 


ext env|” 


Apply 
jAseign 


Apply 
Constant 


—s 
eturn 
alue 


(lp,r APPLY r')z 


poe 


Appendix 3.1 CANONICAL SYSTEM SPECIFY La i 


DIGIT DIGIT<O>,<1>, ... ,<9>5 

LETTER LETPERCA>, <B>, 10. 9<2>3 

MARK MARK<2>,<.>,<m>, 005 gt />5 

BASIC SYMBOL | DIGIt<p> | LETTER<p> | WARK<p> + BASIC SYMBOL<p>; 


STRING BASIC SYMBOL<b> + STRING<SEQ(d)>; 


NANE DIGIT<p> | LETTER<p> WAME< p>, 
BAME<m> ,<n> WARE<un> ,<m.n?5 
STR NAME WAME< a> STR WAME:STR REPS<a:n,>,<$n:0,>; 
VAP HARE MANE< n> VAR NAME: VAR REFS<arn,>; 
BACK REF NAME | BAME<n> BACK REF WAME: BACK REFS<n:n,>; 


DIGIT STR DIGIT<a> DIGIT STR<SEQ(a)>4 
Int DIGIT STR<s> IRT<0> ,<-ar5 
ARITH EXP INT<i> ARITH OPERAED<%1™> 
STR RAME<n> ARITH OPERABD<n> vebe> 
ARITH OPERAND<a> ,<b> ARITH EXP<aed> ,<aod>,<ald> ,<a/b> 


ceoeee 


Peer r wwuuwy 
VMFWNr Wrwnr 


STRING EXP STRING EXP<A>; 
STRING<6> STRING EXP<“s*>; 
STR NARE<n> STRING EXP<n>; 
ARITR EXP< STRING EXP<a>; 
STRING EXP<s>,<t> STRING EXP<sGt>; 


eee ee 
Virwne 


PATTERR STRING<s> PAT EXxp<%_%>; 
STR RAPE<n> PAT EXP<n>; 
VAR NAPE<n> PAT Exec@n@>, 
VAR NAME<n> PAT EXt<®(n)®> 
VAR WAME<n>, DIGIT STR<d> PAT EFP<On/4*>; 
BACK REF RAME<n> PAT EXP<n>; 
PAT EEP<p>,<q> + PAT BXIP<p@q>, 
PAT EX°:STR REFS: VAR REFS: BACK REFS<pir ir try> DIPP KANE LIST<r >, 


a] 
LI:L2:INTERSEC<r cr iri racers A> ° plrrtay:str REFS: VAR rerstpirsrytry3 


ASSIGH RULE STR WAME<n>, STRING EXP<a> + ASSIGN RULE<nes>; 


AARDANDAND WUvwuan 


ceo oe we, te 


QAM wr 


PAT MATCH RULE} STR BAME<n>, STRING EXP<g>, PATTEAN<p>+ PAT MATCH RULEcndpes> 5 
INPUT RULE PATTERR<p> + INPUT AULE<SYS .READ p>; 
OUTPUT RULE STRING EXP<s> > OUTPUT RULE<SYS .PRIRT o>; 


RULE ASSIGN RULE<r> | PAT MATCH RULE<r> | INPUT RULE<r> | 
OUTPUT RULE<r> + UBLABELED RULE<r>; 
UNLABELED RULE<r> + RULE<Qr>; 
UNLABELED RULE<r>, SANE<n> + RULE: LABELS<aQr:n,>; 


LABEL EXP SANE<n> > LABEL EXP: LABEL REPS<nrn,>3 
STR BAME<n> + LABEL EXP<$n> 
ST” RULE<r>, LABEL EXP<t>,<@> + STM<r>,<r/(t)>,<r/8(2)>,<r/S( 1) FP (m)> er /F(m)>,<r/Plmdsli)>; 


STM SEQ STN<6> + STM SEQ<e>; 
STN SEQ<q>, STN<e> + 6TH SEQ<q}s>; 
STM SEQ<q>, STRING<s> + STH SEQ<qd®a>,<®sdbq>; 


SROBOL STM SEQ: LABELS: LABEL REPBcq:t:t,>, NAMECa>, DIFF WANE LEST<EED, (>; 
PROGRAW L:L2: IBTERSECCERD,£:0,£,:RED,A> » SHOBOL PPOGRAN@DEED a>; 


MANE LIST MANE LIST<A>; 
WAME LIST<t>, SMAMEcn> + MAME LI8T<n,2>4 


DIFF CHAR DIPF CHAR<A:B>,<AtC>, 2... ,<@10>3 
DIFF STR DIPF CHAR<x:y>, CRAR STR<axe >, <aye o> + DIFP STR<exe, teys,>3 
DIFF NAME WAME<n>,<m>, DIFF STR<nta> 3 DIPF° NANE<a:m>; 


mm MAME<n> + Il«ara,>; 
IW<n;t>, WAME<g> + I¥<n:a,8>,<nstm,>4 

NOT InN RAME<n> » NOT IR<nr A>; 
NOT IB<n:t>, DIFF RAMNE<nim> + WOT IW<nrm,0>¢ 


HOT CORT CHAR<c> + MOT CONT<crAry 
NOT CONT<c:8), DIFF CHAR<c:d> + NOT CONT<cred>, 


DIFF WANE LIST] DIFF WANE LIST<A>; . 
DIFF NAME LIST<£>, WAWE<n>, HOT IN<nzt> + DIFF NAME LI8T<n,t>; 


L1:LZ:INTERSEC | WAME LIST<£> + LAs L2: IWTERSEC<Ara 3 A>; ; 
L1:L2:INTERSEC<t, tlo:i>, WAME<n>, IN<nri,> Bd Ll: L2:IWTERORE <B,t) tto:m,1>; 
LA:L2:IRTRRSEC<t):to:i>, MAME<n>, NOT ratart,> * LLiL2rIBTERSRC <u, tyr osi>s 


176 


Appendix 3.2 CANONICAL GYSTEN GPECIPYING THE TRANSLATION 


QF _SHOBOL/1 INTO TRE TARGET LANGUAGE 


STR BAKE WAME<n> + STR WAKE<n..2>,<$n..{(LOOKUP. n)>; 


ARITH EXP INT<4> + ARITH OPERAND<“1", ,¢1'>; 
STR WAME<n..n'> + APITH OPERAND<n..n'>; 
ARITH OPERAND<a..a'>,<b..0'> + ARITH EXP<at+b..(+(at,d'))>, 
<and..(-(at,d'))>,<a0d..(M(at,b'))>,<a/d..(/(at,b'))>5 


STRING, EXP STRING BXP<A,.°A'>s 
STRING<s> + BIRING ExP<%™,, 
STR WAME<n, .n’> > STPING EXP<n 
ARITH EXP<e..0'> STRING EXP<a..a'>; 
STRING EXP<s..a'>,<t..t'> + STRING EXP<00t..((CAT 8') t*)>3 


ce cee 


BTRING<s> 

STR MAME<n,.n'> 

VAR RAME<n> 

VAR BNANE<n> 

VAR NAME<n>, DIGIT STR<d> 
BACK REF RAME<n> 

PAT EXP<p..p'>,<q..q'> 
PAT EXP<p..p'> 


PAT EXP<™s, .ty'>; 

PAT EXP<n..n*>3 

PAT EXP:SPECS<®n®,.'n’ : neSTR |>; 

PAT EXP:SPECS<®(n)*®..'n’ : neBAL STR |>3 

PAT EXPISPECS<*n/d®..'n’ : (m,@)cFTX LK STR |>; 
PAT EXP<n..*n'>; 

PAT EXP<00q..((CAT v') a')>3 

PATTERN<p..p'>3 : 


ooeeeceee 
eee eeee 


ASSIGN RULE STR BAME<n..n'>, STR EXP<s..8'> + ASSIGN RULE<n=s..(nm ASSIGN, 's')>; 


2 A ARARRANDDA Vou 


PAT MATCH RULE | STR MAME<n..n'>, STR EXP<s..6'>, PATTERN:SPECS:VAR REFS<p..p':c:y> 
+ PAT MATCH RULE<aUp=s..(MATCH_AND_ASSIGH(n’, p*, Awe’, ‘et, MCw)')>5 


. 
. 


INPUT RULE PATTERN: SPECS: VAR REPS<p..p':c:¥> 
+ INPUT RULE<SYS .READ p.. (MATCH_AND_ASSIGN(READER#, p', Aw,'At, 'o’, lw) )>5 


OUTPUT RULE STRING EXP<s..8'> + OUTPUT RULE(SYS .PRINT s..(PRINTERG ASSIGN, ((CAT PRIBTERS) 8'))>5 


» 
° 


RULE ASSIGH RULE<r..r'> | PAT MATCH RULEcr..r'> | INPUT RULE<r..r'> 
| OUTPUT RULE<r..r’> + UNLABELED RULE<r..r'>; 
UNLABELED RULE<r,.r'> + RULE<Or..r'>; | 
UNLABELED RULE<r..r‘'>, NAME<n> + RULEcnGr.. n ir'>; 


LASEL EXP BAVE<n> + LABEL EXP<n.. ited 
STR RAME<n..n'> + LABBL EXP<$n,,(LOOKUP. ((Cat *.*) n)); 
St™ RULE<r..r'>, LABEL EXP<i,.d'>,<m..m'> + STMcr,or’>,<r/(h).or's 
<r/slt)..r* @ (GOTO, t') ELSE @'A'>,<r/s(1)P(m)..r* (GOTO. t') RLSE 
- ( over/Pimdelt)ecr? apeoote, t') ELBE 
< oor! ‘A’ ELSE @ (GOTO. i')>,<r/Finds oo? GoTo, ¢ 
r/Fla)..e Pah is 
STM<s..8°> + STM SEQ<s,.8°>; 
STM SEQ<q..q°>, STM<s..9°>+ STN SEQ<488..9'38'>3 ; 
STM SEQ<q..9'>, STRING<s> + STH SEQ<qdes,.0°>,<Omdq..a'?5 
ma. | BTN SEQ:STR REFS<q..q':8_>, NAME<n>, LIST: BVS:CORR BULL LIST<o ivyse> 
aed + SROBOL PROORAMC > ED a..LET viet IB (GOTO, 'n'); a'>3 


LIST: BVS: CORR MAME<n> + LIST: BV8S:CORR WULL LIST<nin:'A'>; 
BULL LIST LIST: BYS:CORR MULL LisT<e:d:2>, BAMEcn>, IN<n:t> 
+ LIST: BVS:CORR NULL LIST<t,nrbex>¢ 
LIST: BYS:CORR NULL LIST<k:b:x>, BANE<n>, WOT IN<n:t> 
+ LIST: BVS:CORR HULL LIST<£,n:b:x,°A'>5 


177 


Appendiz 3.3 DEPINITION OF PRIMITIVE FUNCTIOBS YOR SPOROL/2 


Set _definit t variables: | r,2 8tR | b,c BAL STR | 


DIGI@<O> ,<1l>, 1... ,<9>3 

LETTER<A>, <B>, ... 4<Z>3 

MARK<4> ,<->, 0. 4 <%? 

DIGIT<p> | LETTER<p> t MARK<p> + CRAR<p>; 


STR<A>; 
STR<s>, CHAR<c> + STR<sge>; 


BAL STR STR<a>, NOT cOMT<(:e>,<):8> + BAL STR<s> 
BAL STR<@>,<t> - BAL Sra (ah>,cat>y 


FIX LH STR | FIZ LN STR<A:0>; t 
PIX LE STA<ern>, SUCC<m:n>, CRAR<o> + PIX LE BTR<scra>; 


DIFF CRAR<A:B> ,<A:C>, 1... ,<Pr>3 
CHAR<e> + ROT CON<crA>; 
NOT CONT<c:s>, DIFF CRAR<ce:4> + SOT CONT<co:0d>; 


STR OP NINES: ZEROS<9:0>; 

STR OF NINES: ZEROS<nry> + STR OF WINES: SEROS<n9: 70>; 
STR<g> > SUCEC<sO0rel>,<el:s2>, 1... .<06:89>3 
STR<e>, STR OF NINBS:ZEROS<niy> + SUCC<n:ly>,<s0ursly>,<slare2y> 


(0) Miscellspeous dagic primitives 


CAT a= [ «. oes cag “r"] q 
rele,3) « [o7: a 
REQ(s,B) = pa ar ] <P 
COND(#,a,3) = [ m3, cos J 7 
ae 
FALSE ++ PA 
AmD(a,B) = PALSE/TRUE ++ PALSE </s 
PALSE/FALSE ++ YFALSE 

aes [99 Be H } e 
Tes [ ies? aaa e ] = 
(») et x 
ABB a [ =: ee se } * 
NEGATE a [ a ach, a8 ] sd 
18_P08 a = [ ee 3) gnu / v 
18_NEG a = [ vig oo PALE ] 

/s0/r. -* slr 

f/el/r. o> e2r 
succ a : fel 

/a8/r. ++ 89r 

/a9/r. + = /e/Or 

Hie. oe lr 

/0/. 7 0 

(1/9m +9 oF 

/e0/r. + /al/9r 
PRED a © /sl/r. 7 s0Or (af 

/a2/r. ody sir 

/99/r. ee e8r 


178 


age <(x,¥) = Ra(y,'0'} @ x 
ELSE => (PRED x, PRED Y) 
REC SUN(X,¥) = EQ{Y¥,'0') D> x 
ELSE => sum(succ X, PARD X) 
sIon(X,Y) © AED(IS_POB x, 18_PO8 ¥) => ‘At 
AND(I8_POS X, 18_NEG Y) => ‘e! 
aup{Is_sBG X, 18_PoB ¥) => '-' 
ELSE => AN 
LESS(X,Y) = wgo(:(y,x), ‘o') 
DIFF(X,Y) © mates meoaTE(=(¥,%)) 
0 
*(x,Y) 


or . 
SuM(X, PROD (X, PRED X)) 
49° 


™®> sum(i', quor (=(x,7), ¥)) 


=» 

=> 

F -) 

REC PROD(Z,Y) © EQ(y,'0") = 
ELSE > 

=> 


REC Quot(x,Y) = Lzss(x,r¥) 
BLSE 


+(x,Y) © AND(IS_POS X, I8_PO8 Y) =D 8SUM(Z,Y) 
AND(I8S"POS X, I8_3EBG Y) => pirr(x, an ry 
AND(IS_NEG X, I8_POS Y) => DIFP(Y x) 
ELSE => eoarslsum( ane x, ABs Y)) 
(X,Y) = o(X, NEGATE Y) 
*(x,Y) © LET S=SrGM(X,Y) IN CAT(S, PROD (ABS X, ABS Y)) 
/(X,Y) © LET Se8I0N(X,Y) IB catT(s, QUOT (ABS xX, ABS ¥)) 


(c) Basie petters setching function 


REC ASGN_LIST(L,M) = UBT H,T = ED L,TL L 
te ralu,'a') ap tat 
ELSE => LET sle(LOOKUP. H) IN (v1 ASBIGH. (HD M)): 
ASGH_LIST(T, (TL M)) 
MATCH_AND_ASSIGH(BANE, PAT, STR_BXP,SET_BPECS, VARS) 


8. - 


{ srr. cs 
LET so { Je Patt. o (vans, (0), (e),) | BANE) 


IN BQ (#,'A*) => *FALSB'- 
ELSE => LET 91,792,983 © EDs, ED (TL ©), BD (TL (ft 2)) 
IB ASGH_LIST( VARS, #1); 
LET “STR_BXP = (STR_EXP 'A‘) 
39 (aaa ASSIGE. rer toan( (cae 32) STR_BXP)) 93); 


(a) Definition of LOOKUP. to be added to evaluator 


LOOKUP. APPLY. 


(ps) 


179 


a a RT 


Appendis b.1 COBOBICAL STSTEN SPECI7YING THE SFUTAL OF ALOOL/6 


DIGIT<O> ,<1>, . 
LETTER< A> ,<3B> 
MARE<@> <->, , 
BIGIT<p> | LETT: 


<9>5 
<B> cham cir, 22. ot BOE 
b s}>3 


> MARK<p> + CHAR<p>; 


DIo sta DIGIT<@> + Dia StR<BBq(4)>; 
LST ste LETTER<t> + LEY sta<sEq(s)>; 
ID eTR * ID STR<t>y 
+ ID stRcit>, 
DIGIT<@> = + TD BTRcLa>s 


Ste 


ee eee se 


+ BTR<SEQ(e)>; 


PAR DELIN [PAR DBLIN<,>; 
LET STR<t> + PAR DBLIN<)e*:°(>; 


ee 


LABEL: VAL ID STR<e> + LABEL: VAL<s:a>, 
LABBL (VALCL 92>, <¢212>, 01. o< 9295 
LABEL: VAL<irv>, DIGIT<a> > LABEL: VAL<té:vd>; 
LABEL: VAL<f:wv>, DIGIT STN<L> + LABEL: VAL<Ottw>; 


rene Re WAM ewe re 


ADD oP ADD OP<o> <->, 
MULT OP WULT OPsa> ,<*/*>,<O>; 
BRL OP RRL OP<* <*>, <g> <u> <p> ,c*>*> egg 


UEaIGs INT | BIOTT stR<e> > UNBIGR INt<a>; 

UBSIGN HUM DIGIT sTR<e>,<t> + UBSIGH BUB<s>,<.t>,<e.t>; 
° 
- 


Ine ONsIon IBr<i> IETC{> ,<Or> ,<04>45 
RUN ONsIGN BUN<_> NUM<n> , <a> ,<on>5 


Re Free whe 


1D 3D GTA<i> + IB<i>, 
IDLIS? IDSTR<4> + IDLIST<ALTSRQ(’ ,)>; 


AN AROAR VV FETE Ue DONNY & Oe 


var ARITE EBXP<a@> SUBCORIPT LIST: DIMM: 1>5 
ARITH BAP<q>, SUBSCRIPT LIST: DINM<e:m> SUBSCRIPT LIST: DINM<t ,a:al>; 
Ip<i> RRAL/INT/BOOL VAR: R/I/B YARG<i:21,> 
ID<i>, SUBSCRIPT LIST: DIMMH<£ :m> REAL/IBT/BOOL VAR:R/1/B ARRAYS<i th i(m),>5 


Ip<i> > ACT PAR:SPECS1S VARS<1:8WITCH,:3,>5 

ID<1>, SPRC LISP<2> + ACT PAR: SPRCS:R/I/B/R PROCK< 1 :REAL/INTEGER/ BOOLEAN /BONVAL PROCEDURE(E),:1(2),>3 
ID<i>, DIMme- * ACT PAR:GPECS:R/I/B ARBATO<11RBAL/IOTRGER/DOCLEAN AgRAY(m),:1(m),>4 
REAL/IET/BOOL. VARCy> + ACT PAR: SPRCE<v:RRAL, /IRPEGER,/BOOLEAR,>; | 

ARTTR EXP<a> * ACT PAR1SPECE<a:ARITH BSP.>s 


rene 


Say cision 
; 7 ° AR: as >3 
PAR<p>, PAR DELIN<O? + ACT Pam PAaT<aLesne(p 4)>; 
+ REAL/IST/DOOL/BOVAL PCH DES:R/I/B IB PROCE<4:1(),>; 
ACT PAR PART:SPECH<pix,> + REAL/ZNT/BOOL/RORVAL PCH DESiR/I/B IW PROCB<i(p)11(2),>5 
REAL Pow DES<f> | INT PCR DES<f> | BOOL PCH DES<f> | MONVAL PCH DES<f> » pew DRS<r>; 


weetav rune 


URSION Bun<p> | REAL VAR<p> | INT VAR<p> | REAL FCH DES<p> | INT PCH DES<p> + PRINARY<p>, 
ARITE BEP<a> + PRIMARY<(0)>4 

PRIMARY<p>, MULT OP<m> + PRRE<ALTSEQ(p m)>; 

TERN<t>, ABD OP<9> + TERM SEQ<ALTSRG(t 0)>; 

TERN SBQ<a> . + SIMPLE ARITH SEP<a> ,<+s>,<-8>;5 

OIMPLE ARITE BXP<e> + ARITH EXP<s>; 

BOOL REP<b>, SIMPLE ARITH EXP<g>, ARITH EXP<a> + ARITE EXP<IP » THEN © ELSE «>; 


BOOL PRIN<PRUE> ,<PALSE>; 

SIMPLE ARITH EXP<q>,<b>, REL OP<r> > RELATION<arb>; 
RELATION<p> | BOOL VaR<p> | BOOL PCE DES<p> + BOOL PRIN<p>; 
BOOL RAP<b BOOL PAIN<(b)>; 

BOOL PRIN<p> BOOL SEC<p>,< prs 

BOOL BEC<a> BOOL PAC<ALTSEQ(s A)>; 

BOOL PAC<e> BOOL TERM<ALTSEQ(T v)>; 

BOOL FERN<4> BOOL IMP<ALTSEQ(t 3)>; 

BOOL Isp<i> SIMPLE BOOL<ALTSEQ(i 2)>; 

SIMPLE BOOL<s> BOOL BEP<e>; 

BOOL EXP<a>,<d>, SIMPLE BOOL<e> + BOOL EXP<IPF a THEN b ELSE 90>; 


LABEL : VAL<8 > + SIMPLE DES EXP: LABEL REPS<t:¥,>3 

TD<1>, ARITE ExP<a> + SIMPLE DES EXP:8 VARS<i[ajiit,>; 

bea ExP<a> + SIMPLE DES RXP<(4)>; 

SIMPLE DES EXP<a> > DSS BrP<a>; 

BOOL BEP<d>, SIMPLE DES EXP<e>, DES EXP<d> + DES RAP<IF. bd THRE 9 ELSE a>, 


ARITY BXP<a> | DOOL EXP<e> | DES EXP<e> + RXP<e>; 
DUNNY STH<A>; 

STR<g>, BOT CONT<;:0> + COMMENT STN<CONNENT >; 
DES RxP<@> + GOTO BPM<GO TO 4>, 

Pom DES<f> > PROC BTN<f>s 


180 


UNCOND STM 


COMD STK 


St 


STM SEQ 


CONPOURD STK 


TYPE DEC 


ARRAY DEC 


sw pgc 


FORMAL PAR 
PART 
VALUES PART 


SPECIFIER 
PART 


27.12] PROC DEC 


26,1 
28.2 
26.3 


29. 


DEC 
DEC skQ 


BLOCK 


‘ sare) rviey, 


IDSTR<1> 

REAL/INT/BOOL VAR<1>, IDSTR<i> 
REAL/INT/BOOL VAR<4 (1 )> 

R/1/B LEFT PART<R>, ARITH/ARITH/BOOL EIP<e> 
R/T/B LEPT PART<4>, R/I/B ASOT STH<e> 
R/I/B ASCT STH<e> + ASOT STH<e>; 


B/3/3 Cart Pant: ascuep PROC 108<111,>; 
R/1/B LEPT PART: ASGBBD. Vans<i:1,>; 
R/T/B Lert PaRT<3(a}>; 

R/3/B ASOT BTH<A 207; 

R/T/B ASOT SPH<Lre0>y 


eetoe 


ARITH EZP<a> > POR LIST BL<ar; 

ARITH BXP<a> ,<d> ,<@> * POR LIST BL<a STEP & URTIL ¢>, 

ARTTH EBXP<e>, BOOL EXP<b>+ FOR LIST EL<a WHILE ¥>; 

FOR LIST BL<e> + POR LIST<ALTOEQ(e ,)>5 

REAL INT VAR<v>, FOR LIST«<t>, ‘STM: LABELS sLARRL BEPS<erh:t Pig] LirL2:RBL COMP: 4. verbs 
DIFF RETRY LIST<C> + POR STM:LABELS:LABEL REFS<POR viet" DO erhthers 


DUMMY STN<o> | COMMENT Sru<e> | GOTO Stu<e> | PROC BTM<e> | ASCT STH<e> | POR STN<o> | 
BLOCK<s> | COMPOURBD ST<e> + UBCOED STN<e>; 
UBCOMD 2T8<e>, LABEL: VAL<f:v> + UNCOBD STN: LABELS<t*1 “urv,>4 


BOOL EXP<b>, UBCOND STM<ur . + COND STM<IP b THEN ws 
BOOL EXP<b>, UNCOED STH<u>, STH<s> + COND STH<IF ¥ THEE wu ELSE 0>; 
COBD STN<a>, LABEL: VAL<L rv? + COND STM: LABELS<8*s“e:v,>4 


USCORD STM<e> | COMD STM<a> + STM<s>; 
STN<s> ~ STM SEQ<s>; 
STN<s>, STM SEQ<q> BIW SEQ<q;8>3 


STM SEQ<s>, STR<c>, WOT CORT<;:0> ,<ENDsc>,<EBLSE:c> + COMPOUND STN<BEGIN 5 END c>4 


IDLIGT<4> + TIPE DEC: DEC R/I/B VARS<REAL/IETECER/BOOLEAN 4:4,>; 
IDLIST<t> + TYPE DEC: DEC R/I/B VARS<OVE REAL/TRERGER/BOOLEAR 28,5 


ARITH EXP: VARS: I VARS:B VARS:S VARS:R ARRAYS iT anpay! BRATS iN PROCS: PROCE:B PROCS: # PROCS 
J #. . 7 iJ ° ¢ MJ f 
PU eh eRe ee tl ee Reh Chee Sey 


ROUND PAIR: DIM R VARS: DIN X VARS: DIM B VAREsDIN S VARS;DIM R ARRAYS:DIM I ARRAYS 


DIM B ARRAYS: DIN BR PROCE:DIN I PROCS: DIN B PROCE1 DI: Paces 
cans re WE ta VEE UR TE TTS 1V5 Ol) 20, OF 0,8) IP PPPs? PLP APS” s 
BOUED PAIR<p> > BPLIST: DIMN<p11>, 
BOUND PAIR<p>,. BPLIST: DIMKRem> + BPLIGT: DIMN<tprml>; 


SPLIST: DINNCire>, IDSTA<1> 
ARRAY<4 q 

awpray<ife}>, ARRAY SEG<o[2 )> 
ARRAT SEG<s> ARRAY LIST<s> 
ARRAY SEG<o>, ABRAY LIST<t> ARRAY LIST< 
ARRAY LIST:ARRAT VARS<tiv> > ARRAY DEC: DEC R/TIB/A ARRATS<RBAL/IBTEGER/BOOLEAN/A ARBAY é:v>4 
ARRAY LIBT:ARMAY VARS<irw> + ARRAY DECIDEC B/T/B/A ARRAYS<OWN REAL/ISTECER/BOQCLEAN/A ARRAY ts v>3 


ARRAY: ARRAY wabet(t]sita),>; 
ARRAY SEO<i{t }> 
ARRAY SEG<i,s[t]>; 


eeeode 


ons Exr<@> + 80 LIST<aLTsEa(a ,)>; 
IDOTR<1>, SW LIBT<2> + SW DECIDEC 8 VARS<GWITCH 1:94:1,>; 


TDSTR< {> > PORMAL PARrPARB<111,>; 

FORMAL PAR<p>, PAR DELIN<4> + FORMAL PAR LIST<ALPoRa(p a)>y 
FORMAL PAR PART<A>; 

YORMAL PAR LIST<£> + FORMAL PAR PART<(2)>; 
VALUR PART<A>; 

IDLIST<a> + VALUE PART:PARS<¥VALUE £; 1 &>; 
TYPE<RBAL> sIVTESER> ,<BOOLRAR> ; 

TIPR<t> + SPECIFIRG<LASEL>, <S¥ITCE>, <>, <ARRAT>,<t ANRAY>,sPROCEDURE> ,<t PROCEDURE>; 
IDLIST<2>, SPECIPIER<s> © SPECIPIER LIST:PAMB<0i; : t(0),>4 

SPECIPIER PART<A>; 

BSPECIPIEN List<t> + SPECIFIES PaRnT<88Q(2)>; 

IDSTR<i>, FORMAL PAR PART:PARS<:2,>, VALUE PARTIPARS<uru,>, 


SPECIFIER PART: PARE<ese>, STU:R VARS:2 VARS: B VARS:5 VARS:R 


ABRAYS:1 ABRAYS:B ARBAYS:R PROCS:1 PROCS: B PROCRIN PROCS: LABELS 
tLASEL REPS: ASGHED VARS :ASGNED PROC IDB<s:¥ Pt teehee td rDyt PtP arity tt! Seas’* 
LisL2cIS?ERSEC: REL COMP<¥, etlyiTeette"s eat itaettP ty eit tet t ahs “vy, ‘fy IW eths to, 
eee othe? o Og tL, 18; pt Oi> cay it ey pta >, 
Dette Pag Py? SPs tha IPs gt Pi? o<Py ily Pag 
. 
DIPF BEERY LIST<f_>,< Shahfetretty? octet tgterty oStggifytt 
DIss ENTRY Lisrscte, ab, gty, ctv de, ease layed (a MPs (Py ip eilige (tat, i as)’ 
PANS USES :BPECS<f,16,¥, . ettnaby_(BaPEaEh ng MOOLEAN) Nye (SVT TCH) a. (REAL ARMAY er CCTHTEGER apnay) 
Se (BOOLEAR ARRAY )p., _ (REAL PaoceDunE)» -(UTEGER PROCEDURE)», « (BOOLEAR PROCEDURE) 
Dar (MONVAL PROCEDURE) ef LABEL), (14 11,4 ¢(AS0mED) tz? 


+ PROC DBC; DEC A/1/B/H PROCS: RB VARS:1 VARS:B VARS:S VARS:R ARMAYS:1 ARRAYS:B ARRAYS:P PROCS 
3I PROCS:B PROCS1N PROCS: LABELS: LABEL ARPS:ASCHED VARS:ASGHED PROC IDB 
<REAL/INTEGBR/BOOLEAB/A PROCEDURE ifjuce t(z)iviiv/seiresserresrars 


PRIDE PEL tArearde 2? 


sf 


; ’ 
+E Pap Pa? 
ti,swsa? 


ast’! os*“lpes pas” 


pes’! 


TYPE DEC<@> | ARRAY DEC<@> | BW DEC<4> | PROC DEC<d> + DEC«a>; 
DEC<4> + DEC SEQ<a>; 
DEC<@>, DEC SEQ<s> + DEC $BQ<s;d>; 


(ST SEQ:R VARS;I VARS:B VARG:S VARS:R ARRAYS: 1 “ARRAYS: B ARRAYS: ® PROCS 
3I PROCS:B PROCS: PROCS: LABELS: LABEL REFS<e:¥ ehh MytV ate 8 POLI PID TEA Pasties 


DEC SEQ:R VARS:1 VARS:B VARS:5 VARS1N ARRAYS:I ARRAYS:2 ARRAYS: PRocs 

tI PROCS: B PROCE:H PROCS: DEC R VARS: D2C 1 VARE:DEC B VARS: DEC S VARS 

1DRC R ARRAYS1DEC 1 ARRAYS: DEC DB ARRATS: DEC R PROCS:DEC I PROCErDEC B PROCS 

198C B PROCS: DIN R VARS1DIN f VARS:DIW BS VARSIDIN BS VARE:DIN RB ARRAYS:DIN 1 ARRAYS 
PROCS:DIN I PROCS: DIN B Serene M PROCS: LABEL FEFS 


H SOLED LIDS PL eg thy ate, "oe! Yuet ra ma tag Praia’ ee' one 
MOAT AL brs Wee hen) ids tae be, hee bet 
jevace>, WOT CONT<;:0>,<BEDic> <RLSE:e>, 


17 e 


181 


ot 


ALGOL 
PROGRAM 


SPEC LIST 


SPECL:SPEC2 
COMB 


SPEC MATCH 


SPEC LIST 
MATCH 


USES: PARS 
WITH SPECS 


PARS; USES 
:SPECS 


ENTRY 
ENTRY LIST 


DIFF CHAR 
DIFF STR 
DIFF ENTRY 


mm 


mot I8 


BOT Cost 


DIFF EUTRY 
List 
DISS ENTRY 
LISTS 


L1:L2 
2 INTERSEC 


Li:Le 
sREL COMP 


; tev ry” 7 eee pe 
L1:L2: REL saad Sd? A oar Md CACAO Shy ESAS A ET CA 
‘ Re te”. *. tet 
hr oat kel thal Oe Diletta Ok Yay as he Es Ti ee] 
<PEPEiPyg? Pp” a<PyD LPs giPy? Py PEPE giPy> o<PaPL 2D, athe 
SE LeDEhake>, 


DIFF ENTRY LIST<Y, 4¥,a¥aa¥ea®eatsatyaPraPiaPpaPnal”® 
DISS ENTRY LISTS<(v Viv (vy dC Cada, Ma Cp) (py py dp (aan, 


eC vp a rg Org ge Cr g(a a ler ay gla) (ayy) 


(Peg (Peg Ppa) (Pig) (Pig) (Pag) (PtP, > 
-»BLOCK:R VARS:I VARS:B VARS:S VARS:R ARRAYS:1 ARRATS:B ARRAYS:R PROCS 
tI PROCS:B PROCS: M PROCS: LABELS: LABEL REFS 
BI 3 : tvev, serv rvey seme cate. se” 
< aa ts meee "Pre "i%im'"b"be Ver am pire Min bbe 
Rees Mate aa bee te, el ae 


BLOCK<p> COMPOUND STM<p> + PROGRAM STR<p>; 
PROGRAM STR<s>, LABEL: VAL<i:v> + PROGRAM STR<s":“e>3 
PROGRAM STR:R VARS:I VARS:B VARS:S VARS:R ARRAYS:I ARRAYS:B ARRAYS:R PROCS 
:I PROCS:B PROCS:N PROCS:LABELS:LABEL REPS:ASGNED PROC IDS 
ere Ark ATASATASAS A> 
+ ALGOL PROGRAN<a>; 


TYPE<REALg<INTEGER> ,<BOOLEAN> , 
DIMM<1>; 
DIMM<m> + DIMM<ml>; 


SPEC<A>,<LABEL3<SWITCH> ,<ARITH EXP>,<BOOL EXP>,<ASGHED> ,<VALUE>; 

TYPE<t> + SPEC<t>,<VALUE t>,<ASGHED t>,<ASGRED VALUE t>; 

TYPE<t>, DIMM<m> + SPEC<ARRAY>,<t ARRAY>,<t ARRAY (m)>,<VALUE t ARRAY‘m)>; 

TYPE<t>, SPEC LIST<a> + SPEC<PROCEDURE>,<t PROCEDURE>,<t PROCEDURE(s)>,<NOMVAL PROCEDURE(s)>; 
SPEC<s> + SPEC LIST<ALTSEQ(s ,)>; 


SPEC<s> + SPEC1:SPEC2:COMB<Ais:s>,<scm:a>; 

TYPE<t>, DIMM<m> + SPEC]:SPEC2:COMB<ARRAY:REAL ARRAY(m):REAL ARRAY(m)>, 
<t ARRAY:t ARRAY(m):t ARRAY(m)>;<t; VALUE: VALUE t>,<t:ASGNED:ASGNED t>, 
<VALUE t:ASGNED VALUE t>,<t ARRAY(m):VALUZ: VALUE t ARRAY(m)>; 

TYPE<t>, SPEC LIST<a> + SPEC1:SPEC2:COMB<PROCEDURE:NONVAL PROCEDURE(s):HONVAL PROCEDURE(s)>, 
<t PROCEDURE:t PROCEDURE(s):t PROCEDURE(a)>; 


EXP SPEC<A>,<VALUE>,<ASGNED VALUE>; 

SPEC1:SPEC2:COMB<s:t:c> + SPEC MATCH<s:t>; 

EXP SPEC<s> + SPEC MATCH<ARITH EXP:8 REALZ<ARITH EXP:8 INTEGER>, 
<BOOL EXP:s BOOLEAR>; 

SPEC MATCH<s:t> + SPEC LIST MATCH<s:t>; 

SPEC MATCH<s:t>, SPEC LIST MATCH<s';t‘> + SPEC LIST MATCH<e',s:t!',t>; 


IDLIST<t> + USES:PARS WITH SPECS<A:£,>; 

TDSTR<i>, SPECI:SPEC2:COMB<s:tic>, USES:PARS WITH SPECS<u: xis .y> 
~ USES¢PARS WITH SPECS<ui(t):zi ¢,y>; 

IDSTR<i>, SPEC]: SPEC2:CONB<s:t:c>, USES:PARS WITH SPECS<u(t):xi8,y7> 
~ USES:PARS WITH SPECS<u,i(t):ai c,y>3 

ESTRY<i(p)>, SPEC1:SPEC2:COMB«s:t(p)}:c>, USES:PARS WITH SPECS<u:xis,y> 
+ USES:PARS WITH SPECS<ui(p)(t)im ic,y>s 

EBTRY<i({p)>, SPEC1;SPEC2:COMB<e:t(p):c>, USES:PAMS WITH SPECS<u(t):2x18,7> 
+ USES: PARS WITH BPEC8<u,i(p)(t):zi c,y?s 

PARS : USES :SPEC8<A:A7A>5 

USES:PARS WITH SPECS<u: > * PARS : USES :SPEC8<Asu:x>; 

IDSTR<i>, PARS: UBES:SPECS<piu:zi,y> > PARS: USES: SPECS<pi,:u:xy>; 


ID<i>, SPEC LIST<e>, DIMM<m> + ENTRY<1>,<1(a)>,<i(m)75 
ENTRY LIST<A>; 
ENTRY LIST<t>, EMTRY<e> + ENTRY LIST<e,t>; 


DIFF CHAR<ArB>,<A:C>, ... o<E)>3 
CHAR STR OR NWULL<axs>,<ayt>, DIFF CHAR<x:y> + DIPF STR<axezeyt>; 
ID STR<{>,<J>, DIPF STR<1:j>, SPEC LIST<a>,<t> + DIPF EMTRI<1:9>,<i(e):J>,<iss(t)>,siledes (tds 


ID STR<1>, SPEC LIST MATCR<e:t>, DIMM<m> + ENTRY MATCH<iri>,<i(g):i(t)>,<i(m)si(m)>; 
ENTRY MATCH<e1e'> > Ic ai 

IN<ert>, ENTRY MATCH<e:e'> + I< b> <estets>3 

IN<ert>, DIFF ENTRY<e > » [N<ese’,&>,<erke', 

ENTRY<e> + BOT IN<e:A>, 

NOT IN<e:i>, DIFF ENTRY<aie'> + NOT IN<eze", £75 


CHAR STR OR WULL<s> + NOT CONT<s: >3 
NOT CONT<sa:t>, DIFP CHAR<x:y> + HOT CONT<s. yrs 
WOT CONT<exa:ty>, DIFF CHAR<xz:y> + NOT COBT<sxa:tya>; 


DIPF ENTRY LIST<A>; 
DIFF ENTRY LIST<é>, EMTRY<e>, BOT IM<e:t> + DIFF ENTRY LIST<e,t>; 
EWTRY LIST<2> + LIST OF LISTS: UNTON<(i }:2>% 
LIST OF LISTS: UNION<2:u>, ENTRY LIST<g'> + LIST OF LISTS: UNION<(i),(2' )ruit’; - 
ENTRY LIST<i> + DISS PAIR OF LISTS<i:h>; 
DISJ PAIR OF LISTS<i:2*>, EMTRY<e>, NOT [W<e:2> +. DIST PAIR OF LISIS<ise,c'>5 
ENTRY LIST<i> > DISS ENTRY LISTS<(k)>, 
DIsd ENTRY LISTS<z>, LIST OF LISTS: UMNION<t:u>, DISJ PAIR OF LISTS<y:i'> 
~ DISS ENTRY LISTS<t(4')>; 


ERTRY LIST<2> + L1:L2:IMTERSEC:<t:AsA», L1:L2: REL COMP<2i:A 3A; 

LI: Le:INTERSEC<izk*:i>, ENDPRY<e>, IN<esh> + LIsL2:INTERSEC<i: 

Ll: L2:IBTERSEC<2:a%:i>, ENTRY<e>, NOT If<ese> + LI: L2:INTERSEC*i 

L1:L2:REL COMP<2rk'rr>, EMTRY<e>, INce:i> + LI:L2:REL COMP<¢ 

L1:L2:REL COMP<isa’sr>, EMTRY<e>, NOT IN<eré> + L1:L2:REL COMP<i:e,t 
LIsL2:IUTERSEC<t:2":i>, LisL2:REL COMP<isi'sr> + Lich2:INTERSEC:REL COMP<a:i':irr7s 


182 


Appendiz 4.2 CANO) SYSTE! z RAW: TI 
OF ALGOL/60 INTO THE TARGET LABGUAGE 


DIGIT STR<s>,<t> + UNSIGN MUN<s..'e'>,<.t.. (TRANS FRAC 't')>, 
<s.t..(#(TRANS_INT ‘a', TRABS_FRAC e")) 


UNSIGN INT<i> eos tiers teers Ores Ores OS ay 
UNSIGH BUN<n..5'> + HUN<n..a’>, fon, cn! <n... (NEGATE n*)>; 


IDSTR<i> + ID:MANEZ FORMALS:OWN VARS<i,.itA:A>,e4..(4 TA) rh ysd>, 
<i, dschti,>; 
IDSTR<i> + IDLIST<ALTSEQ(i ,)>; 


SUBSCRIPT LIST<a..(CONV_TO_IBT a')>; 
SUBSCRIPT List<t,a. {* ,CcoRy_to_r1a7 ep; 
REAL/INT/BOOL VAR<1,.1°>, 

REAL/INT/BOOL VAR<4[i)..(GET_EL (2*,1°))>5 


ARITE EXP<e.,a'> : 
ARITH EXP<e..a'>, SUBSCRIPT LIST<t,.2°> 
ID<1..1°> 

ID<i,.1°>, SUBSCRIPT LIST<k..t*> 


eee 


PCH DES ID<i..4% ACT PARSS..An.4'>3 
ID<4.,4°> PAR<$,.Am.4'?; 
ID<4,.4%> PARSS..A8. 29> 5 
REAL/INT/BOOL YAR<v..v'> PAR<y. 
ARITH EXP<a..a'> PAR<a, 
BOOL EXP<b..b'> PAR<b 
DES EXP<4..a'> - PARCd,.Aw.a'>3 
ACT PAR<p..p’>, PAR DELIM<d> +ACT PAR PART<ALTSEQ(p 4)..ALTSEQ(p* ,)>3 
ID<4.,4°> *REAL/INT/BOOL/NOMVAL PCH DES<i,.(1* "A")>; 
ID<i..4°>, ACT PAR PART<p..p'>+REAL/INT/BOOL/NOMVAL FCR DES<i(p)..(1"(p*,))> 
REAL FCN DES<f,.f'> | INT FCB DES<f..f'> | BOOL FCM DES<f..f'> 
| MOMVAL Fow DES<f..f°> + FCN DES<f..f'>; 


HR CON AY FWP 


ARITH EXP UNSIGN BUM<p..p*> | REAL VAR<p,.p'> | INT VAR<p..p'> | REAL PCE DES<p..p'> 
| INT FOR DES<p..p'> PRINW<p..p’ 
ARITH EXP<a..a’> PRIM<(a)..0'>; 
PRIN<p..p', MULT OP<m> TERM<ALTSEQ(p m)..COMB(p’ m)7; 
TERN<t..t ADD OP<a> TERM SEQ<ALTSEQ(t a)..COMB(t' a)>; 
TERM SEQ<s..8'? SIMPLE ARITH EXP<s..8'>,<¢e..8'>,<-5..(NEGATE 3° )>3 
SIMPLE ARITH EXP<s..8'> + ARITH EXP<s..a'>; 
BOOL ExP<b..b'>, SIMPLE ARITH EXP<a,.8°>, ARITK BXP<a..a’> 
* ABITH EXP<i¥ > THER s ELSE at..d"=p 0° ELSE =p a'>; 


BOOL PRIN<TRUE..‘TRUE'>,<PALSE,,"FALSE'>; 

SIMPLE ARITH EXP<a..e'>,«b..b'>, REL OP<r> + RELATION<ard..(r(at,b* ))>3 
RELATION<p..p’> | BOOL VAR<p..p'> | BOOL PIM DES<p..p’> * BOOL PRIM<p..p'>; 
BOOL EXP<b..b’> + BOOL PRIM<(b}..b°>3 

BOOL PRIN<p..p'> BOOL SEC<p..p'>,< p..( p')>; 

BOOL SEC<s..8'> BQOL FACCALTSEQ(e A)..COMB(a" A)>; 


- 
- 

BOOL PAC<f,.f'> + BOOL TERM<ALTSEQ(f V)..COMB(T’ V)>; 
- 


BOOL TERM<t..t'> BOOL INP<ALTSEQ(t 2)..COMB(t*® 3)>; 

BOOL IMP<i..i*> SIMPLE BOOL<ALTSEQ(i 2)..COMB(i" 2)>; 

SIMPLE BOOL< s*> + BOOL EXP<s..8'>; 

BOOL EXP<b..b*>,<c..¢'>, SIMPLE BOOL<s..8°> ~ KOOL EXP<IF b THEN s ELSE c..b’ s' ELSE c'>; 


LABEL: VAL<t:¥> + SIMPLE DES EXPci.. “*. .¥7G 
Ip<i.,i°>, ARITH EXP<a..a'> + SIMPLE DES EXP<i{a)}..((GET_EL(CONV_TO_INT a',4)) "A')>3 
DES EXP<d..d’> ~+ SIMPLE DES EXP<(d)..4'73 
SIMPLE DES EXP<s..s'> DES EXP<s..a'>; 
BOOL EXP<b..b'>, SIMPLE DES EXP<s..s5'>, DES EXP<d..a°> + DES EXP 
<IF b THEN s ELSE 4..b° @s° ELSE pa'>; 


exp ARITH EXP<e,.e'> | BOOL EXP<e..e'> | DES EXP<e..e'> + EXP<e..e'> 
DUMMY STH DUMWY STH<A..° A'S 

CONNENT STM { STR<o> + COMHENT STM COMMENT<s..°A'>; 

GOTO STK DES EXP<d..d*'> + GOTO STN<GO TO 4..(GOTO. d')>; 

PROC STN FON DES<f..f'> + PROC STM<f..f'>; 


ASGT STH IDSTR<i> + B/1/B LEFT PART<i..(i# ASSIGN. *)>; 

REAL/INT/BOOL VAR<1..1'>, IDSTR<i> + R/I/B LEFT PART<i,,LET a21' IB 

(4 ASSIGH, &)>; 
REAL/INT/BOOL VAR<i(t)..(GETLEL(i',1°)> > R/I/B LEPT PART<i[i).. LET awit IB 
fu ASSIGN. (RESET_EL(t*,i*,e)}-5 
R/I/B LEFT PART<£..0'>, ARITH/ARITH/BOOL EXP<e..@'> + R/I/B ASGT STW czse.. 
LET ye ( CORY TO_REAL/COMY TO_INT/ILEM 7 e') IR e'-; 
R/I/B LEPT PART<i£..2'>, R/2I/B ASGT STM<s..8'> + R/I/B ASST ETM Lres..e' 50! > 
R/1/B ASGT STM<e..8°> + ASGT STN<8..8'>; 


ARITH LIST EL<a,.anea'>; , 
ARITH > cd. .dB'>,<e,.0'> / LIST EL<e STEP b UNTL c..An.(BTEP(An.a',Aw.b*,AB.c%) > 
ARITH EXP<a..0'>, BOOL EXP<b..b‘> LIST EL<a WHILE b,.Ae.(WHILE(A.a*,Am.d*))>5 

POR LIST EL<e..e'> LIST<ALTSEQ( e,).-ALTSEQ(e" 4)73 : 

REAL/INT VAR<v..¥'>, POR LIST<t..t'>, STM<e..8'> STM<POR viet DO 6..(POR(¥', DELAY_CAT 


183 


ce a Re 
ener a = senate ss 


UNCOND STN COMMENT StN<s..0'> | 
BLOCE<s..8'> | 
UNCOND STM e.,0° , LABEL: VAL tv UNCOED 8TH 0s sw.v t 8* 


BIOL EXP<D..0t>, UBCONS 9Ti<u..u'> @ EADRG: VALE rv, >,<agty,> + COMD STNILADELS<IP t THER u. 
e* a> (GOTO. wv.) ELSE =p(coTe. .v,) “ectatw, bts 
Book RaPen..b'>, Om eVALEL $9 > story. pcharvy> + COND STNILABELE<IP & THEE v, 
»* a (GOTO, x. ~~» 2°e*s(O0TO. ow rt) SI “t Arde, Moov yer 

+a", + COND StMUL- rs. .w doy 


s™ UBCOND STW<s..9'> | COND STM<e..8'> 
STH SEQ "> 
*> STN SEQ<q..4"> STM SEQ<qse..e'3e'>; 


COMPOUBD STA STM SEQ<a>, STH<e> + COMPDUND STIKDROIN « FED e..8'>; : 


TYPE DEC =F IDLIST<2>, LISTICOMA BULL LIST<Ark > — + TPE ARCCRRAL/INTEGER/BOO LEAN 4. . 004.2; 
IDLIST<t>, LISTICORA INDEAED LIST<Ert,> > TYPE DECDEC OvE VARS<OWN RRAL/THTEGER/BOOLEAN ¢..t=1, 20 ,>+ 


ROUND PAL R<a*:“h..a"|b'>; 
setisret ss ate ty 
Pi ss ene Rg Dy 
ADRANsaRbad ibe<tt1).,so(MARE LIS?! reer’ 
abnay:ous tpe<i[t)}.. (ie(Raser Let (s 5 
ARRAY 8BG<4([2]..1=z>) 

> ARBAY 6BG<1 048 ]..1 p94 ,2>5 


ARRAY DEC ARITH BXP<a..a*>,<d..d*> 
BOUND PAIRA< ale> 
BOUND PAIR<p..a}d>, BPLIST<t..x]y> 
BPLIST<t..ajy>, IBGTR<1> 
« TDOTR<1>, LIST:CORR INDEXED LIST<1,1) 4? 


eeeeae 


a LIST<2. wey? Say, oB--PeR1s¥*s 
ARRAY LISTIARRAT [D8:OUS8 TBA<t..28°8124> + ARRAY DECSRRAL/TUTESER/DOGLBAR ARRAY t..0°>s 
ARRAY LIST: ARRAY IDG:OWE TBO<e..1°tAri> - ARRAY BEC: DCS OUR ABRAYS«< OWN REAL/ LETECER/ BOOL RAS adnay hecttrdrg 


DES UxP<4..a°> > BY LIST<ALTOEQ(4 ,).-ALTBEQ(Ae.a" doy 
TDSTR<i>, SW LIST<2..¢°> + SW DEC<SVITCH fred, deCtmpex_Liset'f* at pos 


IDSTR<1>, PORMAL PAR PART: PARA<f2t,>, VALUE PART:PARSsusu,>, SPRCIPIER PAPT<c>, 
STN: RANK PORMALS<a..8':n>, bisL2:REL COMP<f ru, te >acarmzar>s : 
LICL2:TUTERSRCcare, 14>, LISTICORR URSEARE LEST<x, 21> + PROC STWINANE FORMALE 
<REAL/ISTEGER/ BOOLEAN/A PROCEDURE i f,usc.. Cf Jeter 18,tyoAT ot, 

west; if tals 


TYPE DEC<a..a°> | ARRAY BECee..a*> | Si BEC<@. > | PROC DEC<d..4°> + DEC<a..4'> 
DEC<4. .xay> + DEC BEG<d..REC auyz>; 
DEC<d..xey>, DEC QEQ<s..REC utey'’> + DEC SEQca:d.. REC a, gey'.y>; 


STM SEQ:OWN VARS: OUR ARRAYS<8..3'1¥, 7807, DEC SEQ:OVE VARS: OUR ARRATE:DEC GUE YARS 
:DEC OWN ARRATS<4..4'1 wi rest, sre, .>, LIrL2: REL COMPcE gy Sv 94205 >, 
Bh hy th bis es STR<e> + BLOCK:OUN VARS r1OVE ARBAYS:GLOSAL VARS:GLOBAL ARRATS 
<BEGIN dye ERD c.. LET &’ IN s':¥5ragtwag it gees 


BLOCK<p..p'> | COMPOUND STM<p..p*> > PROGRAN SPR<p..p'>; 
PROGRAM STR<s..5°>, LABEL: VAL<isw> © PROGRAM STR<t* 1 “u,v rats 
PROGRAN STR:SLOBAL: VANS:GLOBAL ABRATS:OWE VARG:0N2 ARRAYS :Rany als 
<8 atevtagrks tA> DIV GuTey LIST<v a LIST: CORR. BULL L1et<v Bar eSegtey> 


* ALOOL PROGRAMN<s..LET vgtgt tgt, 18 ets 


LIST: CORR LIST:COMR BULL LIST<A11>; 
BULL LIST | LIST:CORP BULL LIST<trm>, IPSTR<i> + LIST:CORR-MULL LISP<£,8:°A’ .e>; 
my LISTICOMP URSMARE LIST<3 12> 
LIST:COPR UNSHARE LIGT<irm>, IDSTR<i> + LIST: CORN UNSHARE LIST<1,2:(UNGRARE (1 *A')) mrs 
LIST:CORR INDEXBD 
DEXED LIST LIST: CORR ISDEMED LIST<trm>, IDSTA<i>, UNSIOH INT<J> + LIST:CONM IBBOXBD LIST<1,0218) a>; 


184 


Appenaix 4.3 BBYINLZIO¥ OF RAIMIZANE FUNCTION FOR Adit $0 


Sah dotipitions for string vertadies: | aepterz | r.a.scete | 


prare 
CuaR 


[ 
we(s,8) © [ 
wBale,6) © [ 


comn(s,e,e) =[ 


aBD(e,B) © 


HO? as [ 
De® [ 


18_PO8 a © 
Is_3BG ee 
wees 
pes en 
Islet ae 


MAKB_REAL(e, 8) 


DIGIF<O> <>, oe otP4 i 
LETTER AD <B>, coe ptBr Qt O*P et B°e oye <8“ 
MARK<o> ,<o> © 9%—> 

DIOrr<ps | Lartansp: | MAMK<p> © CRARCD>; 


STR<A>, 
STR<e>, CRARSC> + SBTR«tse>) 


a oo *[aee” 9 @)* } . 
- . raee 

ye ots PALS } aft 
- Patan 

op . s.OSROS ] oft 


TRUE 
PALSE 7 


PALSE/TaVE + yaLak 
PALSE/FALSE +* PALaE 


e 
6 
SRUR/TROR ++ TRUE 
TRUE/PALSE ++ FALSE o/3 
PALSE o- RUB 


¢ 

e 

. 

+ 

. 

My ry 

E ea ee 
-s 

e 


aa eases 
L] 
ter 
ee 
ee 
“ - 
ae | od ee Ed i 
pe Se ee ee ees ee ee | 
e 


a 
Be +o 
apt. bed e 
ry oe 
Ce Pe 
Oe -e 
= { a/t. 7 ott o/s 


185 


€) Arithmetic converaton primitives (see arithmetic primitives for definitions of 
en 


7 ouch + Has 
ena [ ny, + om | fof 


jhaleo . puna 

. sa/tDr. + o/atdo: 

TRANS_PRAC 9 //rdr. + = tDlr "| fas 
/s/ - /e/d 


CORY_TO_REAL a © [ spt. +7 = sbt ] 
ew 8. oe sDl a 
ENTIER X = LET A,B © NUM X,DEN x 
mm. /ta,n) 
comY_T0_ImT x = EUTIER(+ (x, '1D2")) 


(a) Arithmetic primitives 


/s0/r, + sir 
/ol/r. + e2r 


succ a = 7 Jaf 
/e8/r. +: = aor 
/a9/r. =) Jaf Or 
fle. + ir 


/0/. ~ 0 
/1/9e. 7 or 
sr, + /a/9r 
PRED a ® felj/r. oe ar fa/ 
/e2/r, + alr 


/a9/r. oe Br 


REC -(X,Y) = Ea(x, '0') wx 

ELSE => = (PRED X, PRED Y) 
REC SUM(Z,Y) = EQ(y, 'o') > xX 

ELSE => sum ( sUCC 2, PRED Y) 
LEss (x,Y) = mEq (=(y,x), '0*) 


REC PROD(X,Y) = EQ(¥,*O') => ‘oF 
ELSE => sum (ZX, PROD(X, PRED Y)) 


DIFF (X,Y) = LES6(X,Y) => BEOATE(*(¥,X)) 
BQ(x,75 => t98 
ELSE => <(x,Y) 


REC QUOT(Z,Y) = Lz88(X,¥) =p 'O' 
ELSE => sun('2' quor(=(z,7%), ¥)) 


PRI_SUN(X,Y) = AND(I8_INT X, IS_INT Y) => 8UM(Z.Y) 
ELSE => LET #1,D1,82,D2 = NUM X, DEN X, HUN Y, DEN Y 
IN LET ff = DIFF(PROD(#1,D2), PROD(H2,D2 )) 
IN UBT =D» propépl,pz} 
IN WAKE_REAL(3,D) 


PRI_DIPP(X,Y) = AND(IS_INT X, IS_INT ¥) => DIFF(xX,Y) 
ELSE > LET -W1,D1,2,D2 = SUN X, DEN X, BUN Y, DENY 
IN LET B= DIPF(PROD(H1,D2) ,PROD(H2,D1)) 
m LET D = PROD(D1,D2) 
IN MAKS_REAL(H,D) 


PRI_PROD(X,Y) = ASD(IS_INT X, I8_INT Y) =p PROD(X,Y) 
ELSE a LET W1,D1,82,D2 © BUM X, DEN X, RUN Y, DEN Y 
Im LET © PROD(H1,B2) 
m LET D = PROD(D1,D2) 
IN MAKE_REAL(#,D) 


PRI_QUOT(X,Y) = AND(IS_INT X, IS_INT Y) =p QuoT(x,Y) ; 
ELSE => LET M1,32,82,D2 = RUM X, DEN X, RUM Y, DEN Y 
IN LET 8 e PROD(H1,D2) 
m LET D = PROD(N2,D1) 
IN MAKE_RABAL(N,D) 


186 


SIGN(X,Y) = 


o(xX,Y) = 


¥(x,Y) 


/(X,Y) 


o(X,Y) 


#(x,¥) 


AND(IS_PoSs 
AND(TS_POS 


x, 
x, 


AND(IS"NEG X, 


ELSE 


AND(IS_POS X, 
AND(IS POS X, 
AND(IS_HEG X, 


ELSE 


18 
1s 
1s 


_POS Y 


“POS ¥ 


Ae 
wort 
le 


NEG Y 


I8_P0s Y) PRI_SUM(X,Y) 
IS_EEG Y) PRI “rrr (x, ABS Y) - 
Is_P0s Y). PRI_DIPF(Y, ABS X 


=> 
=> 
=> 
=> LF Sad 
~ 
= 
=> 
~~) 


x) 
WEGATE(PRI_ SUM(ABS X, ABS Y)) 


LET § = SIGH(X,Y) 
IN CAT(S, PRI_PROD(ABS X, ABS ¥)) 


LET S = STGH(X,Y) 
IN CAT(S, PRI Quon (ass X, ABS Y)) 


+ (X, NEGATE Y) 


LET S = STGN(X,Y) 
In cCaT(S, ENTIER(ABS (/(X,Y))) 


(e) Boolean primitives 


ax = 

A(x,Y) = 
V(XxX,Y) = 
(X,Y) = 
2(X,¥) = 


PRI_LESS(X,Y) = 


<(X,Y) 


=(X,Y) = 
#(X,Y) = 
3{(%,Y) = 
2(x,¥) = 
>(X,Y) = 


ROT X 


AND (X,Y) 


NOT{AND(NOT X, NOT Y)) 


NOT(AND(X, NOT Y)) 


EQ(x,Y) 


LET W1,D1,N2,D2 = NUM X, DEN X, HUM Y, DEB Y¥ 
IK LESS(PROD(N1, D2), PROD(N2, D1)) 


AND(TIS_1 POs X, IS_POS Y) =» PRI_LESS(X,Y) 
AND(IS_POS X, IS_NEG Y) => FALSE 
ABD(IS_NEG X, IS_POS Y) = TRUE 


ELSE 
EQ(X,Y) 
NEQ(X,Y) 


=> PRI_LESS(ABS Y, ABS x) 


v(<(x,Y), = (X,¥) ) 


noT(<(x,¥) 
NOT(<(X,Y) 


) 
) 


(tf) For statement primitives 


ur ASBics = (A 'A'),(B tat), (ce a’) 
AND(IS.| Pos BS LESS(cta' )} => 'at 
AND(IS_HEG Bi Less(ate')) => "a" 


REC STEP(A,B,C) = 


REC WHILE(A,B) # 


REC DELAY_CAT L = 


REC FOR(V,L,S) = 


ELSE 


at, an. (STBP(As, (#(A$B$)),3,C) )} 


LET ayB' = (A °A'),(B °A*) 
IN WOT BY => ‘At 


ELSE 


So, 


* an (WHILE (A,B))) 


LET H,T © HD L, TLL 
gts (H ta") 


I” LET 
In 


EQ(T, 
ELSE 


‘'a’) = HH 


EQ(UH! 'a') 2 (DELAY_CAT T) 
de 


¢ 


LET H,T = HD L, TL L 


In EQ(L, 
ELSE 


"° 


) 


=> ‘A e 

IS_INT YW) (W ASSIGN. (CONY TO_INT H)) ELSE = 
@ dQ Wa stson, (CONV _T0_REAL &)7; 

(S "Ads 

FoR (¥, (DELAY_CAT T), S) 


187 


(g) _Arrey and list itives 


GET_EL(I,L) = { r(I,a)t. + 8 JL 
RESET_EL(I,L,X) © { r({I,e)t. ++ r(I,x)t JL 
REC INDEX_LIST(I,L) = LET H,T = ADL, TL L 


IN NULL T => (T,H) 
ELSE «= - A(T, WH), IwDEX_List(+(1,1), T)] 


REC LAST L = LET H,T = HD L, TL L 
IN NULL T =>H 
ELSE => LAST T 


REC TRUNC L = LET H,T = HD L, TL L 
Im NULL(TL T) =>HD T 
ELSE =>fh, TRuNc 7 


REC ADD1(SUBSLIST,LB,UB) ~ LET S),S,,53,7),T,,T, = LAST SUBSLIST,LAST LB,LAST UB,TRUNCLB,TRUNC UB 
IN WEQ(S, ,S4) >f,, ((s,, nog 


ELSE =>pvni(T, 72,73), SJ] 
REC MAKE _LIST(I,LB,UB) = EQ(1,UB) => (1, 'A") 
ELSE => (1, 'A") , MAKE_LIST((ADD1(I,LB,UB)), LB, UB J) 
REC RESET_LIST - 2Q(J,UB) => (J, GET_EL(J, ARRAY)) 
(ARRAY, J,LB,UB) ELSE oy, GET _EL(J, ARRAY) ) RESET_LIST((ADD1(J,LB,UB)),LB,uB)] 


188 


Appendix 5. THEORETICAL BACKGROUND 
FOR CANONICAL SYSTEMS 


The intent of this appendix is (a) to describe and 
relate the formalisms of Post's formal systems and 
Smullyan's “elementary formal" syatems,° (b) to show that 
the formalism of "canonical" systems presented in this 
dissertation is equivalent (except for changes in notation) 
to Smullyan's elementary formal system, and (c) to show that 
the terminology and interpretation of canonical systems 
given here relate to the terminology and interpretation of 
the formal systems of Post and Smullyan. 

A formal system will be described by giving 
(a) A set A of primitive symbols: For example, this set may 

be the symbols {0 1 wee 9} or the set of characters in 

a computer language. 


(bd) A set C of auxiliary symbols:* For example, this set 
may include the symbols (SQ + =}. 


(c) A set 8 of initial statements composed from the primitive 
and auxiliary symbols: The set S will be composed of 
strings from AUC. #®# 


(a) A set E of well-formed expressions: The set of well- 
formed expressions will generally incorporate symbols 


from AVC and other symbols. 


(e) A series of rules for using the well-formed expressions: 


The rules will be used to derive new statements contain-~- 
ing the primitive symbols from the set S of initial 
statements. 


#All sets of symbols in the systems of Post and Smullyan are 
assumed to be disjoint from each other. 


**The symbol "y" denotes the binary operation of set union. 


189 


(f) An interpretation of the formal system: Strictly speak- 
ing, an interpretation is not part of a formal systen. 
An interpretation is placed on a formal system by a user, 
who wishes to draw conclusions about the objects that 
the symbols of the system represent. 


POST'S SYSTEMS 


(a) Primitive Symbols 
Let A be a finite set of symbols {A, Ag ees Aj}. 


(bd) Auxiliary Symbols 


Let C be a finite set of symbols {c, Cc C,}. 


ares G, 
Let L be the set AUC, the union of the sets A and C. Post 
calls the set L the set of "primitive letters" and does not 
distinguish the sets A or C. The sets A and C are distin- 
guished here to clarify the distinction between a Post system 
and a Smullyan elementary formal system. 


(ec) Initial Statements 
The initial statements S are a set {S, S, ... Ss where 
each Si. isi<k, is a string of letters from L. 


(d) Well-formed Expressions 
Let V be a finite set of symbols {V, V 


variables. ‘ 

A premise is a string of symbols from LUV. 

A conclusion is a string of symbols from LUV. 

A well-formed expression is a string of the form 
"Q12Qos cee 2, produce, c" where the Q,> 1si<m, 
are premises and C is a conclusion such that each 
variable in C also occurs in at least one Q,. A 
well-formed expression is called a production. 


A set E is a system in canonical form if E is a finite set 
{P, Pint wise Pie where each P,, 1<i<n, is a production. 


2 


eee Vv, called 


(e) Rules for Using-Formed Expressions 


Rule 1: A string X is called an instance* of a production P, 
if X can be obtained from P, by substituting for 
each variable in P. some string (possibly null) of 
letters from L. The string substituted for each 
occurrence of the same variable must be the same. 


*The word "instance" is not used by Post. 


190 


Rule 


(r) 


2: If each premise in an instance of a production has 
been derived, then the conclusion of the production 
can be derived. 


The statements derivable from a Post system are 

(a) The initial statements 

(bo) The statements that can be derived from the 
productions by first applying Rule 1 to obtain 
an instance of the production and then applying 
Rule 2 to the production instance, 


Interpretation 


A production can be viewed as a rewriting rule for obtain- 


ing new statements from previously derived statements. 
The interpretation of the derived statements are subject 
to the interpretation of the initial letters. 


Example 1: A Post System Defining the Set of Squares of 


(a) 
(bo) 
L = 
(c) 
(a) 


(e) 
(ft) 


Positive Integers 


Primitive Symbols A= {1} 


Auxiliary Symbols c {SQ} 
{1 SQ} 


Initial Statements S§ = {18Q1} 


Well-formed Expressions v= {u v} 
E = {uSQv+ulSQuuvl1} 


Derived Statements {15Q1 115Q1111 1118Q111111111 ...} 


Interpretation 
The string of ones occurring to the left of "SQ" repre- 


sents the positive integer denoted by the number of 
ones, 

The string of ones occurring to the right of "SQ" repre- 
sents the positive integer that is the numerical 
square of the integer to the left of "SQ". 


Example 2: Another Post System Defining the Set of Squares 


Note: 


of the Positive Integers. 


The intent of this example is to illustrate that the 
"canonical systems" given in this dissertation fit 
the definition of a system in canonical form given by 
Post. 


191 


(a) Primitive Symbols A = {1} 


(ov) Auxiliary Symbols Cc = {N:SQ < > :} 
L = AVC = {1N:8Q < > :} 


(ec) Initial Statements S = {N:8Q<1>} 


N 
(4) Well-formed Expressions v= {u v} 
E = {N:8Q<u:v>?N:SQ<ul:uuvl>} 
(e) Derived Statements 
NW:SQ<1:1> WNW:SQ<113;1111> N:S8Q<111:111111111> ...} 


(tf) Interpretation 
The string "N:SQ" is the name of a set. 


The string "<x:y>", where x and y are strings of ones, 
are members of the set "N:SQ", 

The string of ones before the ":" represents a positive 
integer; the string of ones to the right of the ";:" 
represents the square of the positive integer to the 
left of the ":", 


SMULLYAN'S "ELEMENTARY FORMAL" SYSTEMS- 
Smullyan's elementary formalsystems are a descendant of Post's 
formal systems. 


(a) Primitive Symbols 
Let A be a finite set of symbols {A, A 


aiass A,} called 
the object alphabet. 


2 


(b) Auxiliary Symbols 
Let P be a set of symbols {P, P, ...} called the predi- 


cate alphabet. With each prédicate alphabet symbol we 
associate a unique positive integer called its degree. 
Let Z be the set {, 7} . The symbol "+" is called the 
“implication sign and the symbol "," is called the 
"punctuation" sign. 

The set C of auxiliary symbols is the set PUZ. 


(c) Initial Statements - None 
Smullyan includes the initial statements as members of 
the set of well-formed expressions. 


(d) Well-formed expressions 
Let V be a set of symbols {v, Vv, -.e} called the set of 


variables. 
A term is a string from VUA, 


192 


-193- 


A well-formed atomic formula is a string of the form 
Pt. -te, oss gt where t,., 1l<i<k, are terms and P is 
a pkedfeate of hegree k. 

A well-formed expression is either an atomic formula or 
an expression of the form X, > X, ... + X. (assuming 
association to the right; e«g., “"X, + x,™+ xX," is to 
be read "X_ implies (X, implies X yin) where Xi» 
l<i<m are Atomic formufas.* A wefl-formed expréssion 
is called a well-formed formula. 

A set E is an elementary formal system if E is a finite 
set {F, F, ... F } where the F,» isisn, are well- 


formed formulas, called axioms. 


(e) Rules for Using Well-formed Expressions 


Rule 1: (Substitution) A formula F' can be derived from a 
formula F by substitution if F' can be obtained from 
F by substituting a string in A for each occurrence 
of some variable in F,## 


Rule 2: (Modus Ponens) A formula F' can be derived from 4 
formula F by modus ponens if F is the form X > F! 
and X is some previously derived atomic formula. 
More generally, a formula Xn can be derived from a 
formula of the form X, > K, > ... > X 1 > X if each 
X,, 1<i<n, is an atomtc fofmula and > aes ACE « 
hive each been previously derived. Th thts case, 
we refer to the Z,, Xo» eee » and X,- as premises, 
X asa conclusion, and say that the conclusion x is 


d8rivable from the conjunction of the premises 
Xs Xo» ese , and Xn-1° 


The "provable strings" of an elementary formal system E are 
(1) the axioms of E 
(11) the strings that can be derived from the axioms by 
a finite number of applications of rules 1 and 2. 


n=l 


*Note that no restriction is placed on the use of a variable 
occurring in x but not in Xi, l<i<m-1. 


*#In an elementary formal system, it is not necessary to 
substitute object strings for each variable in formula to 
derive strings from the well-formed formulas. Thus we can 
derive strings containing variables in an elementary formal 
system. In a Post system, we must substitute object strings 
for each variable in a production before we can derive strings. 


*#**Tf each variable is replaced by an object string, this 
generalization of modus ponens is identical to rule 2 for 
deriving strings given by Post. 


193 


An instance of a well-formed formula F is a string obtained 
from F by applying rule 1 (substitution) to all variables in 
F. A formula so obtained is called a sentence. 


The "provable sentences" of an elementary formal system E are 
the provable strings containing no variables. 


(f) Interpretation 
Let P be a predicate of degree k in an elementary formal 
system E, and let Y be a set of k-tuples of strings from 
A. We say that the predicate P represents the set Y if 
the following condition holds: PX, »%5> eee gk is a 
provable sentence in EF if and only if the k-tuple 
(x, ; » Oe »X,) is contained in Y. 


Thus an elementary formal system can be viewed as a set of 
axioms used to enumerate the members of sets whose names are 
denoted by the predicates. 


Example 3: An Elementary Formal System Defining the Set of 
Squares of the Positive Integers 


(a) Primitive Symbols A= {1} 
(b) Auxiliary Symbols P = {R} z= {, >} 


(a) Well-formed Expressions v= {u v} 
E = {R1,1 Ru,v >Rul,uuvl} 


(e) Derived Statements 
{R1,1 R11,1111 R111,111111111 ...} 
The derived statements given above are (in the Smullyan 
sense) the atomic sentences derived from E, 


(f) Interpretation 


If R is the name of a set, the ordered pairs 

{(1,1) (12,1111) (111,111111111) ...} are the members of 
R. We interpret the set R as containing all ordered pairs 
such that the string to the left of the "," represents a 
positive integer and the string to the right of the "," 
represents the positive integer that is the square of the 
integer represented by the string of ones to the left of 
the Lee 


194 


CANONICAL SYSTEMS (as presented in this dissertation) 


The formalism called "canonical systems", as presented in 
this dissertation, is equivalent (except for changes in nota- 
tion) to Smullyan's elementary formal systems. 


(a) 


(b) 


(a) 


(e) 


(f) 


Primitive Symbols In this dissertation the primitive 


or “object alphabet is the set of characters used in 
some computer language. 


Auxiliary Symbols The predicate alphabet P here is a 
string of English letters or digits each separated by 
the tuple sign ":", Each string of English letters of 
digits is called a predicate part, and the number of 
predicate parts in a predicate is usually identical to 
the number of terms in a term tuple following the predi- 
cate. The separation of predicates into parts is made 
(a) to give some mnemonic describing the role of each 
term in a term tuple following the predicate, and (b) to 
provide a convenient notation for abbreviating a canoni- 
cal system. 

The set Z is given as {: +} rather than {, +} since 
the comma "," is a character occurring frequently in 
computer languages. 


Well-formed Expressions A well-formed formula 
x 


FO eee OK + X_" is written here as 
LS cane oa x _ "34x " Bo connote the meaning that 
1? ie Ae Be 7 
x is ferivable from a canonical system if and only if 
each of the instances of the premises X,, X,, ... Xi 
are derivable. This alternate formulation is in the 
spirit of Post. 

The delimiter ";" is introduced here to separate the 
well-formed formulas of a canonical system. The well- 
formed formulas in a Smullyan system are separated by 
the use of appropriate spacing of formulas in a page of 
text. 

Furthermore, the string of terms following a predicate 
is enclosed by the angle brackets "<" and ">" so that the 
characters "," , "3" and "+" can be used in the terms as 


object symbols without the use of quotation marks. 


Rules for Using Well-Formed Expressions The rules for 


using well-formed productions of a canonicalsystem are 
identical to the rules used by Smullyan. 


Interpretation The interpretation given to a canonical 


system here is a hybrid of the interpretation of the 
systems of Post and Smullyan 


195 


(i) 


The productions of a canonical system are 
as rewriting rules (Post). 

The derived strings of a canonical system 
viewed as statements about the membership 
tuples of strings in sets whose names are 
by the predicates (Smullyan). 


196 


viewed 


are 
of n- 
given 


REFERENCES 


The following works describe the theoretical foundations of 
canonical systems; 


1. Emil L. Post 
Formal Reductions of the General Combinatorial 
Decision Problem 
American Journal of Mathematics, Volume 65, pp. 197- 
215, 1943. 


2. Raymond M. Smullyan 
Theory of Formal Systems 
Annals of Mathematical Studies, Number 47, Princeton 
University Press, Princeton, New Jersey, 1961. 


The following references describe work on applications of 
canonic systems to computer languages: 


3. John J. Donovan 

Investigations in Simulation and Simulation Languages, 
Ph.D. dissertation, Yale University, New Haven, 
Connecticut, 1966. 

This reference adapts Smullyan's system to specify 
the syntax of computer languages, and introduces 
the term "“canonic systems" to describe the re- 
sulting variant. 


4. Henry F. Ledgara 
A_ Scheme for the Translation of Computer Languages, 
Ph.D. dissertation proposal, M.I.T., Cambridge, 
Massachusetts, 1967. 
This reference applies canonic systems to define 
both the syntax of a computer language and its 
translation into a target language. 


5. John J. Donovan and Henry F. Ledgard 

A Formal System for the Specification of the Syntax 
and Translation of Computer Languages 

AFIPS, Proceedings of the 1967 Fall Joint Computer 
Conference, Volume 31, Thompson Books, Washington, 
D.C., 1967. 

This reference also considers the use of canonic 
systems to define the syntax and translation of a 
computer language. 


197 


oye Sete se PrN SY we PEO URS ETT RTT Ego) APRN Neg Oe OMY ale She tan gee 


6. Joseph W. Alsop 
A Canonic Translator 
MAC-TR-46, Project MAC, M.I.T., 1967 
This reference describes an algorithm that uses a 


canonic: system specification of a language as a 
data base to recognize strings specified by the 
canonic system and generate their translation. 


7. jJames T. Doyle 3 
Issues of Undecidability in Cenonic Systems, S.M. 


dissertation, M.I.T., Cambridge, Massachusetts, 
1968, : 


8. Joseph P. Haggerty 


The following is the basic reference for Markov algorithms: 


9. Andrei A. Markov 
Theory of Algorithms 
Acadamy of Sciences of the USSR, Moscow, 1954, English 
Translation by Israel Program for Scientific Trans- 
lations. 


The following describe the extension of Markov algorithms 
used in this dissertation. 


10. A. Caracciolo di Forino 
Generalized Markov Algorithms and Automata 
Lecture delivered at the International Summer School 
of Physics Course on Automata Theory, Ravello, 
Italy, 1964, 


ll. A. Caracciolo di Forino and N. Wolkenstein 
On a Class of Programming Languages for Symbol 
Manipulation based on Extended Markov Algorithms, 
Centro Sudi Calcolatrici Electroniche del C.N.R., 
Pisa Italy, 1963. 


12. A. Caracciolo di Forino 
String processes and generalized Markov algorithm 


in Symbol Manipulation Languages and Techniques, 


North-Holland Publishing Company, Amsterdam, 1968. 


198 


The following are other references on Markov algorithms: 


13. Anton P. Zeleznikar 
Some Algorithm Theory and its Applicability 
American Mathematical oie gee thie ee Series 
2, Volume 1 pp. 141-15 1963. This reference 
describes a 2- dimensional "variant of Markov algo- 
rithms. 


14. V. K. Detlovs 
The Equivalence of Normal Algorithms and Recursive 
Functions 


American Mathematical Society Translations, Series 
2, Volume 23, pp. ae Le 
15. \V. S. Cernjavskii 
On a Class of Normal Markov Rigdpithns 


American Mathematical Society Translations, Series 
2, Volume 46, pp. 1+35, Toes 
16. OL. A. Kaluzhnin 
Algorithmization of Mathematic Problems 


Problems of Cybernetics, Volume 2, pp. 371-391, 1961. 


This reference analyzes the advantages and short- 
comings of Markov algorithms. 


The following are the basic references on the \-calculus: 


17. Alonzo Church 
The Caiculi of Lambda-Conversion 
Annals of Mathematical Studies, Number 6, Princeton 
University Press, Princeton, New Jersey, 1941. 


18. Haskell B. Curry and Robert Feys 
Combinatory Logic, Volume I, North-Holland Publishing 
Company, Amsterdam, 1958. 


The following references describe the theory and application 
of the iA-calculus: : 


19. Peter J. Landin 
A Formal Description of ALGOL. 60 


Formal Language Description Languages for Computer 
Programming, North-Holland Publishing Company, 


Amsterdam, 1966. 


20. Peter J. Landin 
The A-Calculus Approach 


Advances in Programming and Non-Numerical Computation, 
Permagon Press, New York, 1966. 


199° 


ew mmm nee oe mm a emer prem en cee RS me ON I ETE Nyy SRA A gS ore eo 


21. Peter J. Landin 
A Correspondence Between ALGOL 60 and Church's Lambda- 
Notation 
Communications of the ACM, Volume 8, Numbers 2 and 
3, February 1965. 


22. Christopher Strachey 
Towards a Formal Semantics 


’ Programming, North-Holland Publishing Company, 
Amsterdam, 1966. 


23. C. Bohm 
The CWH as a Formal and Description Language 
Formal Language Description Languages for Computer 
Programming, North-Holland Publishing Company, 
Amsterdan, 1966. 


24. Arthur Evans, Jr. 
Class notes for Linguistic Structures, Subject 6.688, 
M.I.T., Fall Term, 1966. 
These notes are based on class lectures given by 
Peter Landin,. 


25. dJohn M. Wozencraft 
Class notes for "Programming Linguistics," Subject 
6.231, M.I.T., Spring Term, 1968. 


26. James H. Morris 


Lamda Calculus Models of Programming Languages, Ph.D. 
dissertation, M.I.T., December Tote. 


The following references describe the computer languages 
SNOBOL/1 and ALGOL/60. 


27. David J. Farber, Ralph E. Griswold, and I. P. Polonsky 
SNOBOL, A String Manipulating Language 
Journal of the ACM, Volume 11, Number 2, pp. 21-30, 
196h, 


28. Peter Naur (Editor) 
Revised Report on the Algorithmic Language ALGOL 
60 
Communications of the ACM, Volume 6, Number 1, pp. 
1-23, 1963. 


200 


aE: 


The following references have also been used: 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


Peter E. Lauer 

The Formal Explicates of the Notion of An Algorithm, 
Technical Report 25.072, IBM Laboratory Vienna, 
February, 1967. 

This reference explains and relates formalisms (in- 
cluding Post's systems, Markov algorithms, and 
the A-calculus) related to the theory of comput- 
ability. 


A. M. Turing 
On Computable Numbers with an Application to the 
Entscheidungsproblen 
Proceedings of the London Mathematical Society, 
Volume 42, pp. 230-265, 1936. 


A. M. Turing 
Computability and Lambda-—Definability 
Journal of Symbolic Logic, Volume 4, pp. 153-160, 
1937. 


Stephen C. Kleene 
Lambda-Definability and Recursiveness 
Duke Mathematical Journal, Volume 2, pp. 340-353, 
1936. 


E. V. Detlovs 
The Equivalence of Normal Algorithms and Recursive 
Functions 


American Mathematical Society Translations, Series 
2, Volume 23, pp. 15-01, 1963. 
Marvin L,. Minsky 


Computation: Finite and Infinite Machines, Prentice- 
Hall, Inc., Englewood Cliffs, New Jersey, 1967. 


Noam Chomsky 
On Certain Formal Properties of Grammars 
Information and Control, Volume 2, Number 4, pp. 
393-395, 1959. 


Alfred B. Manaster 
Class notes for “Introduction to Mathematical Logic, 
Subject 18.886, M.I.T., Spring Term, 1967. 


Ld 


Thomas B. Steel, Jr. (Editor) 


Formal Language Description Languages for Computer 
Programming, North-Holland Publishing Company, 


Amsterdam, 1966. 


201 


38. 


39. 


4O. 


hi, 


ho, 


43. 


Fa ee Ba) Seon ee ee ee ES ee et eS ees 


Trenchard More ; 
Relations Between Simplicational Caliculi, Ph.D. 
dissertations, M.I.T., Cambridge, Massachusetts, 
1962, 


Calvin N. Mooers 
How Some Fundamental Problems are Treated in the 
Design of the TRAC Language 
Symbol Manipulation Languages Techniques, North- 
Holland Publishing Company, Amsterdam, 1968. 


Joseph Weizenbaun 
ELIZA ~ A Computer Program for the Study of Natural 
Language Communication between Man and Machine 
Communications of the ACM, Volume 9, Number 1, pp. 
36-45 r) 1966. 


Jerome A, Feldman 
A Formal Semantics for Computer Languages and its 
Application to a Compiler-Compiler 
Communications of the ACM, Volume 9, Number 1, 1966. 


A Programmer's Introduction to the IBM System 1360 
Architecture nstructions, and Assembler Language, 
International Business Machines Corporation, White 


Plains, New York, 1965. 


Francis J. Russo 


A Heuristic Approach to Alternate Routing in a Job 
Shop ~ a 


MAC-TR-19, Project MAC, M.I.T., 1965. 


202 


BIOGRAPHICAL NOTE 


Henry Francis Ledgard greeted Lowell, Massachusetts, on 
February 22, 1943. He graduated from Keith Academy of Lowell 
in 1960 and received a Bachelor of Science degree (magna 
cum laude) from Tufts University in 1964. While at Tufts, 
he was elected president of the Tufts Tau Beta Pi chapter, 
which received the "Outstanding Chapter of the Year Award" 
in 1963. Honors during his matriculation included the "Amos 
E. Dolbear Award for Excellence in Electrical Engineering" 
and the "Award for Outstanding Service to the Tufts Community." 

After graduating from Tufts, the author began studies in 
computer science at Massachusetts Institute of Technology, 
where he received the degree of Master of Science in 1965 
and the degree of Electrical Engineer in 1967. While at 
M.I.T. the author was associated with Bell Laboratories and 
Massachusetts General Hospital. In 1965 he became a member 
of the staff of the Electrical Engineering Department, first 
as a teaching assistant, and later as a research assistant 
in which capacity he undertook the research presented in this 
dissertation. 

The author likes northwest days, snow, music, cats, 


Monhegan Island, politics, working hard, and playing hard. 


203 


a TE i at ia NS ON Re ee cee ne a ee ee ee ea a OE eT eT eT Cert eer een ee Se OE TO ee a ye ee ere ee Fla oS ith TS + Be en MOD ROE 


VENDING MACHINE OF THE FUTURE 


c 


10. 


1. 


13. 


14, 


d. 


UNCLASSIFIED 
Security Classification 


DOCUMENT CONTROL DATA - R&D 


(Security classification of title, body of abatract and indexing annotation must be entered when the overall report is classified) 


+» ORIGINATING ACTIVITY (Corporate author) 2a. REPORT SECURITY CLASSIFICATION 


Massachusetts Institute of Technology UNCLASSIFIED 


None 


» REPORT TITLE 


A Formal System for Defining the Syntax and Semantics of Computer Languages 


- DESCRIPTIVE NOTES (Type of report and inclusive dates) 


Ph.D. Thesis, Department of Electrical Engineering, February 1969 


- AUTHOR(S) (Last name, first name, initial) 


Ledgard, Henry F. 


REPORT DATE Ja. TOTAL NO. OF PAGES |78. NO. OF REFS 
April 1969 : 204 43 


+ CONTRACT OR GRANT NO. Qa. ORIGINATOR’S REPORT NUMBER(S) 


Office of Naval Research, Nonr-4102 (01) 


» PROJECT NO. MAC-TR-60 (THESIS) 


NR-048-189 


9b. OTHER REPORT: NO(S) (Any other numbers that may be 
assigned this report) 


RR 0003-09-01 


AVAILABILITY/LIMITATION NOTICES 


This document has been approved for public release and sale; 
its distribution is unlimited. 


SUPPLEMENTARY NOTES 12. SPONSORING MILITARY ACTIVITY 

Advanced Research Projects Agency 
3D-200 Pentagon 

Hone Washington, D.c. 20301 
ABSTRACT 


The thesis of this dissertation is that formal definitions of the syntax 
and semantics of computer languages are needed. This dissertation investigates 
two candidates for formally defining computer languages: 

(1) the formalism of canonical systems for defining the syntax of a 
computer language and its translation into a target language, and 

(2) the formalisms of the \-calculus and extended Markov algorithms as a 
combined formalism used as the basis of a target language for defining the 
semantics of a computer language. 


Formal definitions of the syntax and semantics of SNOBOL/1 and ALGOL/60 
are included as examples of the approach. 


KEY WORDS 


Computers Multiple-access computers Syntax and semantics 
Computer languages On-line computer Time-sharing 
Machine-aided cognition Real-time computers Time-shared computers 


DD .29%.1473 (M.LT.) UNCLASSIFIED 


Security Classification 


